Brief. Bioinformatics - An empirical assessment of validation practices for molecular classifiers.

Tópicos

{ model(2656) set(1616) predict(1553) }
{ featur(3375) classif(2383) classifi(1994) }
{ detect(2391) sensit(1101) algorithm(908) }
{ patient(2315) diseas(1263) diabet(1191) }
{ case(1353) use(1143) diagnosi(1136) }
{ clinic(1479) use(1117) guidelin(835) }
{ time(1939) patient(1703) rate(768) }
{ system(1976) rule(880) can(841) }
{ bind(1733) structur(1185) ligand(1036) }
{ studi(2440) review(1878) systemat(933) }
{ error(1145) method(1030) estim(1020) }
{ general(901) number(790) one(736) }
{ record(1888) medic(1808) patient(1693) }
{ estim(2440) model(1874) function(577) }
{ method(1969) cluster(1462) data(1082) }
{ method(1557) propos(1049) approach(1037) }
{ data(3008) multipl(1320) sourc(1022) }
{ activ(1138) subject(705) human(624) }
{ decis(3086) make(1611) patient(1517) }
{ network(2748) neural(1063) input(814) }
{ imag(2675) segment(2577) method(1081) }
{ take(945) account(800) differ(722) }
{ assess(1506) score(1403) qualiti(1306) }
{ problem(2511) optim(1539) algorithm(950) }
{ model(2220) cell(1177) simul(1124) }
{ import(1318) role(1303) understand(862) }
{ state(1844) use(1261) util(961) }
{ patient(2837) hospit(1953) medic(668) }
{ age(1611) year(1155) adult(843) }
{ sampl(1606) size(1419) use(1276) }
{ gene(2352) biolog(1181) express(1162) }
{ first(2504) two(1366) second(1323) }
{ use(2086) technolog(871) perceiv(783) }
{ use(1733) differ(960) four(931) }
{ result(1111) use(1088) new(759) }
{ process(1125) use(805) approach(778) }
{ model(3404) distribut(989) bayesian(671) }
{ can(774) often(719) complex(702) }
{ imag(1947) propos(1133) code(1026) }
{ data(1737) use(1416) pattern(1282) }
{ inform(2794) health(2639) internet(1427) }
{ measur(2081) correl(1212) valu(896) }
{ imag(1057) registr(996) error(939) }
{ sequenc(1873) structur(1644) protein(1328) }
{ method(1219) similar(1157) match(930) }
{ imag(2830) propos(1344) filter(1198) }
{ motion(1329) object(1292) video(1091) }
{ treatment(1704) effect(941) patient(846) }
{ surgeri(1148) surgic(1085) robot(1054) }
{ framework(1458) process(801) describ(734) }
{ chang(1828) time(1643) increas(1301) }
{ learn(2355) train(1041) set(1003) }
{ concept(1167) ontolog(924) domain(897) }
{ algorithm(1844) comput(1787) effici(935) }
{ extract(1171) text(1153) clinic(932) }
{ data(1714) softwar(1251) tool(1186) }
{ design(1359) user(1324) use(1319) }
{ control(1307) perform(991) simul(935) }
{ care(1570) inform(1187) nurs(1089) }
{ method(984) reconstruct(947) comput(926) }
{ search(2224) databas(1162) retriev(909) }
{ featur(1941) imag(1645) propos(1176) }
{ howev(809) still(633) remain(590) }
{ data(3963) clinic(1234) research(1004) }
{ studi(1410) differ(1259) use(1210) }
{ risk(3053) factor(974) diseas(938) }
{ perform(999) metric(946) measur(919) }
{ research(1085) discuss(1038) issu(1018) }
{ system(1050) medic(1026) inform(1018) }
{ model(2341) predict(2261) use(1141) }
{ visual(1396) interact(850) tool(830) }
{ compound(1573) activ(1297) structur(1058) }
{ perform(1367) use(1326) method(1137) }
{ studi(1119) effect(1106) posit(819) }
{ blood(1257) pressur(1144) flow(957) }
{ spatial(1525) area(1432) region(1030) }
{ health(3367) inform(1360) care(1135) }
{ model(3480) simul(1196) paramet(876) }
{ monitor(1329) mobil(1314) devic(1160) }
{ ehr(2073) health(1662) electron(1139) }
{ research(1218) medic(880) student(794) }
{ data(2317) use(1299) case(1017) }
{ medic(1828) order(1363) alert(1069) }
{ signal(2180) analysi(812) frequenc(800) }
{ cost(1906) reduc(1198) effect(832) }
{ group(2977) signific(1463) compar(1072) }
{ intervent(3218) particip(2042) group(1664) }
{ patient(1821) servic(1111) care(1106) }
{ can(981) present(881) function(850) }
{ analysi(2126) use(1163) compon(1037) }
{ health(1844) social(1437) communiti(874) }
{ structur(1116) can(940) graph(676) }
{ high(1669) rate(1365) level(1280) }
{ cancer(2502) breast(956) screen(824) }
{ use(976) code(926) identifi(902) }
{ drug(1928) target(777) effect(648) }
{ implement(1333) system(1263) develop(1122) }
{ survey(1388) particip(1329) question(1065) }
{ activ(1452) weight(1219) physic(1104) }
{ method(2212) result(1239) propos(1039) }

Resumo

Proposed molecular classifiers may be overfit to idiosyncrasies of noisy genomic and proteomic data. Cross-validation methods are often used to obtain estimates of classification accuracy, but both simulations and case studies suggest that, when inappropriate methods are used, bias may ensue. Bias can be bypassed and generalizability can be tested by external (independent) validation. We evaluated 35 studies that have reported on external validation of a molecular classifier. We extracted information on study design and methodological features, and compared the performance of molecular classifiers in internal cross-validation versus external validation for 28 studies where both had been performed. We demonstrate that the majority of studies pursued cross-validation practices that are likely to overestimate classifier performance. Most studies were markedly underpowered to detect a 20% decrease in sensitivity or specificity between internal cross-validation and external validation [median power was 36% (IQR, 21-61%) and 29% (IQR, 15-65%), respectively]. The median reported classification performance for sensitivity and specificity was 94% and 98%, respectively, in cross-validation and 88% and 81% for independent validation. The relative diagnostic odds ratio was 3.26 (95% CI 2.04-5.21) for cross-validation versus independent validation. Finally, we reviewed all studies (n = 758) which cited those in our study sample, and identified only one instance of additional subsequent independent validation of these classifiers. In conclusion, these results document that many cross-validation practices employed in the literature are potentially biased and genuine progress in this field will require adoption of routine external validation of molecular classifiers, preferably in much larger studies than in current practice.

Resumo Limpo

propos molecular classifi may overfit idiosyncrasi noisi genom proteom data crossvalid method often use obtain estim classif accuraci simul case studi suggest inappropri method use bias may ensu bias can bypass generaliz can test extern independ valid evalu studi report extern valid molecular classifi extract inform studi design methodolog featur compar perform molecular classifi intern crossvalid versus extern valid studi perform demonstr major studi pursu crossvalid practic like overestim classifi perform studi mark underpow detect decreas sensit specif intern crossvalid extern valid median power iqr iqr respect median report classif perform sensit specif respect crossvalid independ valid relat diagnost odd ratio ci crossvalid versus independ valid final review studi n cite studi sampl identifi one instanc addit subsequ independ valid classifi conclus result document mani crossvalid practic employ literatur potenti bias genuin progress field will requir adopt routin extern valid molecular classifi prefer much larger studi current practic

Resumos Similares

Artif Intell Med - Fuzzy model identification of dengue epidemic in Colombia based on multiresolution analysis. ( 0,722284140442999 )
BMC Med Inform Decis Mak - Developing an algorithm to identify people with Chronic Obstructive Pulmonary Disease (COPD) using administrative data. ( 0,702233149289055 )
Med Biol Eng Comput - Application of the RIMARC algorithm to a large data set of action potentials and clinical parameters for risk prediction of atrial fibrillation. ( 0,691618669497689 )
Methods Inf Med - Reliable blood pressure self-measurement in the obstetric waiting room. ( 0,683657298846632 )
Artif Intell Med - Training artificial neural networks directly on the concordance index for censored data using genetic algorithms. ( 0,681834746779713 )
AMIA Annu Symp Proc - Effect of data combination on predictive modeling: a study using gene expression data. ( 0,681680349515719 )
J Chem Inf Model - Applicability domain based on ensemble learning in classification and regression analyses. ( 0,678287716314022 )
AMIA Annu Symp Proc - Automatic Prediction of Conversion from Mild Cognitive Impairment to Probable Alzheimer's Disease using Structural Magnetic Resonance Imaging. ( 0,672615962567541 )
Comput. Biol. Med. - Extracting predictive SNPs in Crohn's disease using a vacillating genetic algorithm and a neural classifier in case-control association studies. ( 0,656360691917467 )
J Chem Inf Model - Beyond the scope of Free-Wilson analysis: building interpretable QSAR models with machine learning algorithms. ( 0,648712549138651 )
J Biomed Inform - Selection of interdependent genes via dynamic relevance analysis for cancer diagnosis. ( 0,638990745142242 )
Comput. Biol. Med. - Assessing common classification methods for the identification of abnormal repolarization using indicators of T-wave morphology and QT interval. ( 0,638816165453594 )
J Chem Inf Model - Classifier ensemble based on feature selection and diversity measures for predicting the affinity of A(2B) adenosine receptor antagonists. ( 0,62957069596008 )
J Am Med Inform Assoc - Use of a support vector machine for categorizing free-text notes: assessment of accuracy across two institutions. ( 0,629407387132107 )
J Chem Inf Model - Time-split cross-validation as a method for estimating the goodness of prospective prediction. ( 0,624762842339751 )
Comput. Biol. Med. - A prediction model of substrates and non-substrates of breast cancer resistance protein (BCRP) developed by GA-CG-SVM method. ( 0,624721733395598 )
J Chem Inf Model - GRID-based three-dimensional pharmacophores II: PharmBench, a benchmark data set for evaluating pharmacophore elucidation methods. ( 0,608488198712299 )
J Chem Inf Model - In silico prediction of chemical Ames mutagenicity. ( 0,608405477365475 )
J Chem Inf Model - RS-Predictor models augmented with SMARTCyp reactivities: robust metabolic regioselectivity predictions for nine CYP isozymes. ( 0,606980021132588 )
J Chem Inf Model - Does rational selection of training and test sets improve the outcome of QSAR modeling? ( 0,605642589816182 )
J Chem Inf Model - A comparison of different QSAR approaches to modeling CYP450 1A2 inhibition. ( 0,605206899455858 )
Comput. Biol. Med. - Myocardial border detection from ventriculograms using support vector machines and real-coded genetic algorithms. ( 0,605174489186072 )
Med Biol Eng Comput - Validating motor unit firing patterns extracted by EMG signal decomposition. ( 0,60113051284246 )
Artif Intell Med - Automatic detection of epileptic seizures on the intra-cranial electroencephalogram of rats using reservoir computing. ( 0,599906037314967 )
J Am Med Inform Assoc - Predicting changes in hypertension control using electronic health records from a chronic disease management program. ( 0,597324584113947 )
BMC Med Inform Decis Mak - Improving sensitivity of machine learning methods for automated case identification from free-text electronic medical records. ( 0,596908191902256 )
J Chem Inf Model - Three useful dimensions for domain applicability in QSAR models using random forest. ( 0,595419991380653 )
J Chem Inf Model - A multiscale simulation system for the prediction of drug-induced cardiotoxicity. ( 0,592240686081178 )
Comput. Biol. Med. - In silico prediction of spleen tyrosine kinase inhibitors using machine learning approaches and an optimized molecular descriptor subset generated by recursive feature elimination method. ( 0,590546824720669 )
BMC Med Inform Decis Mak - Regression tree construction by bootstrap: model search for DRG-systems applied to Austrian health-data. ( 0,585678923110351 )
IEEE Trans Image Process - Neighborhood Supported Model Level Fuzzy Aggregation for Moving Object Segmentation. ( 0,585489671483746 )
J Biomed Inform - MysiRNA: improving siRNA efficacy prediction using a machine-learning model combining multi-tools and whole stacking energy (G). ( 0,585097358285102 )
Comput Methods Programs Biomed - Kinetic modelling of haemodialysis removal of myoglobin in rhabdomyolysis patients. ( 0,584795222474178 )
BMC Med Inform Decis Mak - Sequential detection of influenza epidemics by the Kolmogorov-Smirnov test. ( 0,58388767369945 )
J Chem Inf Model - iLOGP: a simple, robust, and efficient description of n-octanol/water partition coefficient for drug design using the GB/SA approach. ( 0,579977897285549 )
J Chem Inf Model - Study of chromatographic retention of natural terpenoids by chemoinformatic tools. ( 0,576707093274083 )
Int J Health Geogr - Incorporating geographical factors with artificial neural networks to predict reference values of erythrocyte sedimentation rate. ( 0,576402115005128 )
J Chem Inf Model - Impact of template choice on homology model efficiency in virtual screening. ( 0,575902861505485 )
J Chem Inf Model - Pharmacophore assessment through 3-D QSAR: evaluation of the predictive ability on new derivatives by the application on a series of antitubercular agents. ( 0,572574256958919 )
Artif Intell Med - Classification of healthy and abnormal swallows based on accelerometry and nasal airflow signals. ( 0,572246580946379 )
J. Comput. Biol. - The complexity of the dirichlet model for multiple alignment data. ( 0,571601115154542 )
Int J Neural Syst - On the segmentation and classification of hand radiographs. ( 0,562423226661146 )
J Chem Inf Model - In silico prediction of total human plasma clearance. ( 0,561939496144293 )
J Med Syst - Design of an enhanced fuzzy k-nearest neighbor classifier based computer aided diagnostic system for thyroid disease. ( 0,561701717843272 )
AMIA Annu Symp Proc - Predicting discharge mortality after acute ischemic stroke using balanced data. ( 0,56066555381872 )
J Chem Inf Model - Coping with unbalanced class data sets in oral absorption models. ( 0,558816115300345 )
AMIA Annu Symp Proc - Motivating the additional use of external validity: examining transportability in a model of glioblastoma multiforme. ( 0,55870388140291 )
Comput Math Methods Med - An ensemble-of-classifiers based approach for early diagnosis of Alzheimer's disease: classification using structural features of brain images. ( 0,558633274604084 )
Comput Methods Programs Biomed - Automated detection of exudates and macula for grading of diabetic macular edema. ( 0,555420091256988 )
J Med Syst - A neuro-fuzzy identification of ECG beats. ( 0,553791700645234 )
BMC Med Inform Decis Mak - Concordance and predictive value of two adverse drug event data sets. ( 0,553735077288727 )
BMC Med Inform Decis Mak - Measuring preferences for analgesic treatment for cancer pain: how do African-Americans and Whites perform on choice-based conjoint (CBC) analysis experiments? ( 0,55137375514116 )
BMC Med Inform Decis Mak - Identifying patients with diabetes and the earliest date of diagnosis in real time: an electronic health record case-finding algorithm. ( 0,55082424168899 )
Artif Intell Med - Improving predictive models of glaucoma severity by incorporating quality indicators. ( 0,550282478265333 )
J Chem Inf Model - Estimation of carcinogenicity using molecular fragments tree. ( 0,54983694021163 )
Comput. Biol. Med. - SVM-based feature selection to optimize sensitivity-specificity balance applied to weaning. ( 0,549759624594999 )
J Chem Inf Model - Prediction of linear cationic antimicrobial peptides based on characteristics responsible for their interaction with the membranes. ( 0,549280312009066 )
Artif Intell Med - Cancer survival classification using integrated data sets and intermediate information. ( 0,548970973219418 )
J Am Med Inform Assoc - Use of computerized algorithm to identify individuals in need of testing for celiac disease. ( 0,546526419265376 )
J Chem Inf Model - Robust scoring functions for protein-ligand interactions with quantum chemical charge models. ( 0,546058072809368 )
AMIA Annu Symp Proc - Predicting the dengue incidence in Singapore using univariate time series models. ( 0,54526627429889 )
J Chem Inf Model - Ligand and structure-based classification models for prediction of P-glycoprotein inhibitors. ( 0,544277188868438 )
AMIA Annu Symp Proc - Advanced proficiency EHR training: effect on physicians' EHR efficiency, EHR satisfaction and job satisfaction. ( 0,543953734501999 )
Med Decis Making - Developing a tuberculosis transmission model that accounts for changes in population health. ( 0,543783278203049 )
J Chem Inf Model - Oversampling to overcome overfitting: exploring the relationship between data set composition, molecular descriptors, and predictive modeling methods. ( 0,542550110763348 )
J Chem Inf Model - Classification of compounds with distinct or overlapping multi-target activities and diverse molecular mechanisms using emerging chemical patterns. ( 0,541809371489096 )
Comput. Aided Surg. - Evaluation of a computational model to predict elbow range of motion. ( 0,541313563300201 )
J Am Med Inform Assoc - Harvest: an open platform for developing web-based biomedical data discovery and reporting applications. ( 0,540423226188065 )
J Chem Inf Model - Choosing feature selection and learning algorithms in QSAR. ( 0,539586450554805 )
J Chem Inf Model - Predicting pK(a) values of substituted phenols from atomic charges: comparison of different quantum mechanical methods and charge distribution schemes. ( 0,539023921961361 )
Comput. Biol. Med. - Automated identification of normal and diabetes heart rate signals using nonlinear measures. ( 0,537589890184795 )
Comput Methods Programs Biomed - A predictive model of longitudinal, patient-specific colonoscopy results. ( 0,537349399402766 )
J Chem Inf Model - Pre-processing feature selection for improved C&RT models for oral absorption. ( 0,537238957427597 )
Comput. Biol. Med. - A framework for diagnosis of urinary incontinence disease based on scoring measures and automatic classifiers. ( 0,536728086589154 )
Med Biol Eng Comput - Classification of hysteroscopical images using texture and vessel descriptors. ( 0,535753820365143 )
Neural Comput - Kernels for longitudinal data with variable sequence length and sampling intervals. ( 0,534990093932163 )
Comput Methods Programs Biomed - Jump neural network for online short-time prediction of blood glucose from continuous monitoring sensors and meal information. ( 0,534865401251508 )
J Chem Inf Model - Comparative studies on some metrics for external validation of QSPR models. ( 0,5345576196365 )
Brief. Bioinformatics - Exome sequence read depth methods for identifying copy number changes. ( 0,531492681617283 )
Neural Comput - High-dimensional cluster analysis with the masked EM algorithm. ( 0,530423800674206 )
Int J Comput Assist Radiol Surg - Brain tumor classification on intraoperative contrast-enhanced ultrasound. ( 0,530265873985012 )
J Chem Inf Model - Kinase-kernel models: accurate in silico screening of 4 million compounds across the entire human kinome. ( 0,528499931649595 )
J Chem Inf Model - Development of novel 3D-QSAR combination approach for screening and optimizing B-Raf inhibitors in silico. ( 0,528474796800568 )
IEEE J Biomed Health Inform - Accelerometry-based home monitoring for detection of nocturnal hypermotor seizures based on novelty detection. ( 0,526270516873701 )
Comput Methods Programs Biomed - Modeling the glucose regulatory system in extreme preterm infants. ( 0,525048482056712 )
Comput Methods Programs Biomed - An associative memory approach to medical decision support systems. ( 0,523463247101022 )
J. Comput. Biol. - An almost optimal algorithm for generalized threshold group testing with inhibitors. ( 0,523448973986948 )
J Am Med Inform Assoc - Choosing blindly but wisely: differentially private solicitation of DNA datasets for disease marker discovery. ( 0,523033870104853 )
Comput Methods Programs Biomed - Analysis of in-air movement in handwriting: A novel marker for Parkinson's disease. ( 0,522115168414578 )
J Chem Inf Model - Binary classification of aqueous solubility using support vector machines with reduction and recombination feature selection. ( 0,520403282672985 )
IEEE J Biomed Health Inform - Automatic identification and classification of muscle spasms in long-term EMG recordings. ( 0,52014086818595 )
J Chem Inf Model - Introducing conformal prediction in predictive modeling. A transparent and flexible alternative to applicability domain determination. ( 0,519038105290741 )
Comput. Biol. Med. - Fast opposite weight learning rules with application in breast cancer diagnosis. ( 0,518981966290642 )
Comput. Biol. Med. - Automated diagnosis of Age-related Macular Degeneration using greyscale features from digital fundus images. ( 0,518148761772956 )
J Chem Inf Model - Leave-cluster-out cross-validation is appropriate for scoring functions derived from diverse protein data sets. ( 0,517713317162142 )
Comput. Biol. Med. - Assessment of multichannel lung sounds parameterization for two-class classification in interstitial lung disease patients. ( 0,517159744267362 )
J Chem Inf Model - Rank order entropy: why one metric is not enough. ( 0,51660406147139 )
Comput. Biol. Med. - A hybrid intelligent system for diagnosing microalbuminuria in type 2 diabetes patients without having to measure urinary albumin. ( 0,515460115048282 )
J Chem Inf Model - Statistical analysis and compound selection of combinatorial libraries for soluble epoxide hydrolase. ( 0,515385158002897 )
J Chem Inf Model - Optimizing predictive performance of CASE Ultra expert system models using the applicability domains of individual toxicity alerts. ( 0,514169376591627 )