Brief. Bioinformatics - Data construction for phosphorylation site prediction.

Tópicos

{ method(1969) cluster(1462) data(1082) }
{ model(2656) set(1616) predict(1553) }
{ measur(2081) correl(1212) valu(896) }
{ use(1733) differ(960) four(931) }
{ model(3404) distribut(989) bayesian(671) }
{ perform(1367) use(1326) method(1137) }
{ method(2212) result(1239) propos(1039) }
{ featur(3375) classif(2383) classifi(1994) }
{ state(1844) use(1261) util(961) }
{ estim(2440) model(1874) function(577) }
{ treatment(1704) effect(941) patient(846) }
{ research(1085) discuss(1038) issu(1018) }
{ data(3008) multipl(1320) sourc(1022) }
{ survey(1388) particip(1329) question(1065) }
{ studi(2440) review(1878) systemat(933) }
{ import(1318) role(1303) understand(862) }
{ research(1218) medic(880) student(794) }
{ health(1844) social(1437) communiti(874) }
{ bind(1733) structur(1185) ligand(1036) }
{ model(2220) cell(1177) simul(1124) }
{ studi(1119) effect(1106) posit(819) }
{ can(774) often(719) complex(702) }
{ imag(1057) registr(996) error(939) }
{ method(1219) similar(1157) match(930) }
{ problem(2511) optim(1539) algorithm(950) }
{ chang(1828) time(1643) increas(1301) }
{ concept(1167) ontolog(924) domain(897) }
{ extract(1171) text(1153) clinic(932) }
{ general(901) number(790) one(736) }
{ featur(1941) imag(1645) propos(1176) }
{ age(1611) year(1155) adult(843) }
{ group(2977) signific(1463) compar(1072) }
{ activ(1138) subject(705) human(624) }
{ result(1111) use(1088) new(759) }
{ decis(3086) make(1611) patient(1517) }
{ activ(1452) weight(1219) physic(1104) }
{ imag(1947) propos(1133) code(1026) }
{ data(1737) use(1416) pattern(1282) }
{ inform(2794) health(2639) internet(1427) }
{ system(1976) rule(880) can(841) }
{ sequenc(1873) structur(1644) protein(1328) }
{ imag(2830) propos(1344) filter(1198) }
{ network(2748) neural(1063) input(814) }
{ imag(2675) segment(2577) method(1081) }
{ patient(2315) diseas(1263) diabet(1191) }
{ take(945) account(800) differ(722) }
{ motion(1329) object(1292) video(1091) }
{ assess(1506) score(1403) qualiti(1306) }
{ surgeri(1148) surgic(1085) robot(1054) }
{ framework(1458) process(801) describ(734) }
{ error(1145) method(1030) estim(1020) }
{ learn(2355) train(1041) set(1003) }
{ clinic(1479) use(1117) guidelin(835) }
{ algorithm(1844) comput(1787) effici(935) }
{ method(1557) propos(1049) approach(1037) }
{ data(1714) softwar(1251) tool(1186) }
{ design(1359) user(1324) use(1319) }
{ control(1307) perform(991) simul(935) }
{ care(1570) inform(1187) nurs(1089) }
{ method(984) reconstruct(947) comput(926) }
{ search(2224) databas(1162) retriev(909) }
{ case(1353) use(1143) diagnosi(1136) }
{ howev(809) still(633) remain(590) }
{ data(3963) clinic(1234) research(1004) }
{ studi(1410) differ(1259) use(1210) }
{ risk(3053) factor(974) diseas(938) }
{ perform(999) metric(946) measur(919) }
{ system(1050) medic(1026) inform(1018) }
{ model(2341) predict(2261) use(1141) }
{ visual(1396) interact(850) tool(830) }
{ compound(1573) activ(1297) structur(1058) }
{ blood(1257) pressur(1144) flow(957) }
{ spatial(1525) area(1432) region(1030) }
{ record(1888) medic(1808) patient(1693) }
{ health(3367) inform(1360) care(1135) }
{ model(3480) simul(1196) paramet(876) }
{ monitor(1329) mobil(1314) devic(1160) }
{ ehr(2073) health(1662) electron(1139) }
{ patient(2837) hospit(1953) medic(668) }
{ data(2317) use(1299) case(1017) }
{ medic(1828) order(1363) alert(1069) }
{ signal(2180) analysi(812) frequenc(800) }
{ cost(1906) reduc(1198) effect(832) }
{ sampl(1606) size(1419) use(1276) }
{ gene(2352) biolog(1181) express(1162) }
{ first(2504) two(1366) second(1323) }
{ intervent(3218) particip(2042) group(1664) }
{ time(1939) patient(1703) rate(768) }
{ patient(1821) servic(1111) care(1106) }
{ use(2086) technolog(871) perceiv(783) }
{ can(981) present(881) function(850) }
{ analysi(2126) use(1163) compon(1037) }
{ structur(1116) can(940) graph(676) }
{ high(1669) rate(1365) level(1280) }
{ cancer(2502) breast(956) screen(824) }
{ use(976) code(926) identifi(902) }
{ drug(1928) target(777) effect(648) }
{ implement(1333) system(1263) develop(1122) }
{ process(1125) use(805) approach(778) }
{ detect(2391) sensit(1101) algorithm(908) }

Resumo

Protein phosphorylation is one of the most pervasive post-translational modifications, regulating diverse cellular processes in various organisms. As mass spectrometry-based experimental approaches for identifying phosphorylation events are resource-intensive, many computational methods have been proposed, in which phosphorylation site prediction is formulated as a classification problem. They differ in several ways, and one crucial issue is the construction of training data and test data for unbiased performance evaluation. In this article, we categorize the existing data construction methods and try to answer three questions: (i) Is it equivalent to use different data construction methods in the assessment of phosphorylation site prediction algorithms? (ii) What kind of test data set is unbiased for assessing the prediction performance of a trained algorithm in different real world scenarios? (iii) Among the summarized training data construction methods, which one(s) has better generalization performance for most scenarios? To answer these questions, we conduct comprehensive experimental studies for both non-kinase-specific and kinase-specific prediction tasks. The experimental results show that: (i) different data construction methods can lead to significantly different prediction performance; (ii) there can be different test data construction methods that are unbiased with respect to different real world scenarios; and (iii) different data construction methods have different generalization performance in different real world scenarios. Therefore, when developing new algorithms in future research, people should concentrate on what kind of scenario their algorithm will work for, what the corresponding unbiased test data are and which training data construction method can generate best generalization performance.

Resumo Limpo

protein phosphoryl one pervas posttransl modif regul divers cellular process various organ mass spectrometrybas experiment approach identifi phosphoryl event resourceintens mani comput method propos phosphoryl site predict formul classif problem differ sever way one crucial issu construct train data test data unbias perform evalu articl categor exist data construct method tri answer three question equival use differ data construct method assess phosphoryl site predict algorithm ii kind test data set unbias assess predict perform train algorithm differ real world scenario iii among summar train data construct method one better general perform scenario answer question conduct comprehens experiment studi nonkinasespecif kinasespecif predict task experiment result show differ data construct method can lead signific differ predict perform ii can differ test data construct method unbias respect differ real world scenario iii differ data construct method differ general perform differ real world scenario therefor develop new algorithm futur research peopl concentr kind scenario algorithm will work correspond unbias test data train data construct method can generat best general perform

Resumos Similares

IEEE Trans Neural Netw Learn Syst - Improved Fault Classification in Series Compensated Transmission Line: Comparative Evaluation of Chebyshev Neural Network Training Algorithms. ( 0,659809460149039 )
J Chem Inf Model - Does rational selection of training and test sets improve the outcome of QSAR modeling? ( 0,650450559433383 )
J Chem Inf Model - Metabolism site prediction based on xenobiotic structural formulas and PASS prediction algorithm. ( 0,646494291169358 )
Neural Comput - A nonparametric clustering algorithm with a quantile-based likelihood estimator. ( 0,642883727817534 )
Int J Health Geogr - Detection of arbitrarily-shaped clusters using a neighbor-expanding approach: a case study on murine typhus in south Texas. ( 0,621298451772568 )
Spat Spatiotemporal Epidemiol - Optimal selection of the spatial scan parameters for cluster detection: a simulation study. ( 0,61745738183234 )
AMIA Annu Symp Proc - Using hierarchical mixture of experts model for fusion of outbreak detection methods. ( 0,61726174411118 )
Int J Health Geogr - Detecting activity locations from raw GPS data: a novel kernel-based algorithm. ( 0,606416545876846 )
J Chem Inf Model - Comparison of combinatorial clustering methods on pharmacological data sets represented by machine learning-selected real molecular descriptors. ( 0,602175698344662 )
J Chem Inf Model - Investigation of the use of spectral clustering for the analysis of molecular data. ( 0,600421496978255 )
Int J Health Geogr - A binary-based approach for detecting irregularly shaped clusters. ( 0,597793961794124 )
J. Comput. Biol. - Inconsistent Denoising and Clustering Algorithms for Amplicon Sequence Data. ( 0,597543969364885 )
Med Decis Making - Developing appropriate methods for cost-effectiveness analysis of cluster randomized trials. ( 0,594730702414761 )
J Chem Inf Model - Study of chromatographic retention of natural terpenoids by chemoinformatic tools. ( 0,591009390331781 )
J Biomed Inform - Learning Bayesian networks from survival data using weighting censored instances. ( 0,583159444217168 )
J Chem Inf Model - Consensus methods for combining multiple clusterings of chemical structures. ( 0,582490376097175 )
Int J Health Geogr - Using statistical methods and genotyping to detect tuberculosis outbreaks. ( 0,581743923695814 )
J. Comput. Biol. - A geometric clustering algorithm with applications to structural data. ( 0,575240846513225 )
Comput Math Methods Med - Decimative spectral estimation with unconstrained model order. ( 0,566811060732008 )
Artif Intell Med - Support vector methods for survival analysis: a comparison between ranking and regression approaches. ( 0,565121987357976 )
Int J Health Geogr - Detection of clusters of a rare disease over a large territory: performance of cluster detection methods. ( 0,564791741335039 )
Med Decis Making - Multiple imputation methods for handling missing data in cost-effectiveness analyses that use data from hierarchical studies: an application to cluster randomized trials. ( 0,563156700673203 )
J Biomed Inform - Screening drug target proteins based on sequence information. ( 0,560377771770181 )
AMIA Annu Symp Proc - Motivating the additional use of external validity: examining transportability in a model of glioblastoma multiforme. ( 0,559903989690717 )
IEEE J Biomed Health Inform - Red blood cell cluster separation from digital images for use in sickle cell disease. ( 0,559660879105282 )
IEEE Trans Pattern Anal Mach Intell - A Link-Based Approach to the Cluster Ensemble Problem. ( 0,556882980861323 )
Comput Methods Programs Biomed - Fuzzy and hard clustering analysis for thyroid disease. ( 0,556875481156146 )
Res Synth Methods - Synthesizing regression results: a factored likelihood method. ( 0,555505641062202 )
IEEE Trans Vis Comput Graph - GPU-based Multilevel Clustering. ( 0,543458313742195 )
J Med Syst - Application of attribute weighting method based on clustering centers to discrimination of linearly non-separable medical datasets. ( 0,541239178688684 )
J Chem Inf Model - Algorithm for reaction classification. ( 0,540932391573323 )
J Chem Inf Model - Beyond the scope of Free-Wilson analysis: building interpretable QSAR models with machine learning algorithms. ( 0,540889823198532 )
Artif Intell Med - Vicinal support vector classifier using supervised kernel-based clustering. ( 0,536748951844143 )
Comput Math Methods Med - Feature selection for better identification of subtypes of Guillain-Barr? syndrome. ( 0,535405887435164 )
J Integr Bioinform - An evolutionary and visual framework for clustering of DNA microarray data. ( 0,535182413317827 )
J Chem Inf Model - GRID-based three-dimensional pharmacophores II: PharmBench, a benchmark data set for evaluating pharmacophore elucidation methods. ( 0,534803001652918 )
J Chem Inf Model - Real external predictivity of QSAR models: how to evaluate it? Comparison of different validation criteria and proposal of using the concordance correlation coefficient. ( 0,531683043248909 )
J. Comput. Biol. - Enhancing Gibbs sampling method for motif finding in DNA with initial graph representation of sequences. ( 0,529795698233297 )
Med Biol Eng Comput - Validating motor unit firing patterns extracted by EMG signal decomposition. ( 0,528773070421798 )
IEEE Trans Image Process - Statistical modeling of 3-D natural scenes with application to Bayesian stereopsis. ( 0,527864432614339 )
Lifetime Data Anal - Efficiency improvement in a class of survival models through model-free covariate incorporation. ( 0,526621898865861 )
J. Comput. Biol. - EDAR: an efficient error detection and removal algorithm for next generation sequencing data. ( 0,526457510682605 )
J Clin Monit Comput - Comparison of two different generations of NIRS devices and transducers in healthy volunteers and ICU patients. ( 0,525433707984486 )
Int J Health Geogr - Gumbel based p-value approximations for spatial scan statistics. ( 0,52456790328107 )
Med Decis Making - Cost-saving tree-structured survival analysis for hip fracture of study of osteoporotic fractures data. ( 0,524199103854918 )
IEEE Trans Pattern Anal Mach Intell - Probabilistic Common Spatial Patterns for Multichannel EEG Analysis. ( 0,523105294462574 )
Comput Methods Programs Biomed - Generalized rough fuzzy c-means algorithm for brain MR image segmentation. ( 0,522898851921149 )
J Chem Inf Model - In silico prediction of chemical acute oral toxicity using multi-classification methods. ( 0,522621208812807 )
Med Biol Eng Comput - Discrimination power of long-term heart rate variability measures for chronic heart failure detection. ( 0,520844826015466 )
Med Biol Eng Comput - Cardiogoniometric parameters for detection of coronary artery disease at rest as a function of stenosis localization and distribution. ( 0,519928256276926 )
Brief. Bioinformatics - Iteratively reweighted LASSO for mapping multiple quantitative trait loci. ( 0,519077472661343 )
J Chem Inf Model - RS-Predictor models augmented with SMARTCyp reactivities: robust metabolic regioselectivity predictions for nine CYP isozymes. ( 0,518975828378358 )
J Biomed Inform - Average correlation clustering algorithm (ACCA) for grouping of co-regulated genes with similar pattern of variation in their expression values. ( 0,516639453760672 )
J Chem Inf Model - Template CoMFA applied to 116 biological targets. ( 0,515384368511351 )
Comput Math Methods Med - A robust rerank approach for feature selection and its application to pooling-based GWA studies. ( 0,514295910340692 )
Brief. Bioinformatics - A large-scale benchmark study of existing algorithms for taxonomy-independent microbial community analysis. ( 0,514186630761208 )
Int J Neural Syst - A genetic graph-based approach for partitional clustering. ( 0,511475977838826 )
J Biomed Inform - Statistical file matching of flow cytometry data. ( 0,509499216261144 )
Int J Med Robot - Intraoperative measurement of femoral antetorsion using the anterior cortical angle method: a novel use for smartphones. ( 0,509362685595084 )
Int J Health Geogr - Incorporating geographical factors with artificial neural networks to predict reference values of erythrocyte sedimentation rate. ( 0,509334406707836 )
IEEE Trans Image Process - Maximum a posteriori video super-resolution using a new multichannel image prior. ( 0,509268468072402 )
Comput Methods Programs Biomed - Bayesian bivariate generalized Lindley model for survival data with a cure fraction. ( 0,50894388882968 )
Med Biol Eng Comput - Dynamic cerebral autoregulation: different signal processing methods without influence on results and reproducibility. ( 0,50775128507045 )
Artif Intell Med - Missing data in medical databases: impute, delete or classify? ( 0,507244956940649 )
J Chem Inf Model - iLOGP: a simple, robust, and efficient description of n-octanol/water partition coefficient for drug design using the GB/SA approach. ( 0,506937321783267 )
Artif Intell Med - Training artificial neural networks directly on the concordance index for censored data using genetic algorithms. ( 0,505534133729461 )
IEEE Trans Image Process - Neighborhood Supported Model Level Fuzzy Aggregation for Moving Object Segmentation. ( 0,504781659421881 )
Neural Comput - Spontaneous clustering via minimum -divergence. ( 0,504630905804652 )
J Integr Bioinform - Clustering of gene expression profiles: creating initialization-independent clusterings by eliminating unstable genes. ( 0,504610036961341 )
Comput Methods Programs Biomed - Improvements on a privacy-protection algorithm for DNA sequences with generalization lattices. ( 0,504454721403003 )
J Chem Inf Model - Combined 3D-QSAR, molecular docking, and molecular dynamics study on piperazinyl-glutamate-pyridines/pyrimidines as potent P2Y12 antagonists for inhibition of platelet aggregation. ( 0,503844001401304 )
J Clin Monit Comput - Evaluation of a computer program for non-invasive determination of pulmonary shunt and ventilation-perfusion mismatch. ( 0,503391308707437 )
Comput Math Methods Med - A new particle swarm optimization-based method for phase unwrapping of MRI data. ( 0,502753457991969 )
Comput Methods Programs Biomed - Automated segmentation of optic disc region on retinal fundus photographs: Comparison of contour modeling and pixel classification methods. ( 0,501280335380799 )
Med Biol Eng Comput - Detection of swallows with silent aspiration using swallowing and breath sound analysis. ( 0,501146914092249 )
Comput. Biol. Med. - Analysis of adductors angle measurement in Hammersmith infant neurological examinations using mean shift segmentation and feature point based object tracking. ( 0,499740710290241 )
Comput Math Methods Med - Prediction of breeding values for dairy cattle using artificial neural networks and neuro-fuzzy systems. ( 0,499515936313166 )
Artif Intell Med - Missing data imputation using statistical and machine learning methods in a real breast cancer problem. ( 0,499318944200125 )
Comput. Biol. Med. - Probing the existence of medium pulmonary crackles via model-based clustering. ( 0,498388486389578 )
AMIA Annu Symp Proc - An efficient bayesian method for predicting clinical outcomes from genome-wide data. ( 0,498063719071355 )
Spat Spatiotemporal Epidemiol - Performance of cancer cluster Q-statistics for case-control residential histories. ( 0,497366277831714 )
Neural Comput - High-dimensional cluster analysis with the masked EM algorithm. ( 0,497081155123402 )
J Chem Inf Model - Time-split cross-validation as a method for estimating the goodness of prospective prediction. ( 0,496970722096646 )
Comput Math Methods Med - A wavelet relational fuzzy C-means algorithm for 2D gel image segmentation. ( 0,494911937317638 )
J Biomed Inform - Quantifying the determinants of outbreak detection performance through simulation and machine learning. ( 0,49478445408651 )
J Chem Inf Model - Quantitative structure-activity relationship models for ready biodegradability of chemicals. ( 0,49454433076199 )
J Biomed Inform - Learning patient-specific predictive models from clinical data. ( 0,494413628485657 )
Comput Biol Chem - Multi objective SNP selection using pareto optimality. ( 0,493450723802846 )
J Chem Inf Model - Development of novel 3D-QSAR combination approach for screening and optimizing B-Raf inhibitors in silico. ( 0,493285917379329 )
J Chem Inf Model - Estimation of carcinogenicity using molecular fragments tree. ( 0,492829857683272 )
J Chem Inf Model - New strategy for receptor-based pharmacophore query construction: a case study for 5-HT7 receptor ligands. ( 0,492540902975584 )
Comput Methods Programs Biomed - Systematic method to assess microvascular recruitment using contrast-enhanced ultrasound. Application to insulin-induced capillary recruitment in subjects with T1DM. ( 0,49237932581711 )
J Chem Inf Model - A comparison of different QSAR approaches to modeling CYP450 1A2 inhibition. ( 0,489745961695306 )
Comput Methods Programs Biomed - Mixture and non-mixture cure fraction models based on the generalized modified Weibull distribution with an application to gastric cancer data. ( 0,489470110961577 )
J Chem Inf Model - A new approach to radial basis function approximation and its application to QSAR. ( 0,487848472409138 )
Int J Comput Assist Radiol Surg - Assessing performance in brain tumor resection using a novel virtual reality simulator. ( 0,487732677622749 )
Comput. Biol. Med. - Predicting cardiac autonomic neuropathy category for diabetic data with missing values. ( 0,487473313123156 )
AMIA Annu Symp Proc - Selecting cases for whom additional tests can improve prognostication. ( 0,486654375450278 )
Comput Methods Programs Biomed - Privacy-preserving Kruskal-Wallis test. ( 0,485676657265647 )
Res Synth Methods - A multivariate model for the meta-analysis of study level survival data at multiple times. ( 0,485344959483655 )