J Chem Inf Model - Real external predictivity of QSAR models. Part 2. New intercomparable thresholds for different validation criteria and the need for scatter plot inspection.


{ model(2656) set(1616) predict(1553) }
{ model(3480) simul(1196) paramet(876) }
{ studi(1410) differ(1259) use(1210) }
{ measur(2081) correl(1212) valu(896) }
{ assess(1506) score(1403) qualiti(1306) }
{ general(901) number(790) one(736) }
{ learn(2355) train(1041) set(1003) }
{ first(2504) two(1366) second(1323) }
{ high(1669) rate(1365) level(1280) }
{ drug(1928) target(777) effect(648) }
{ can(774) often(719) complex(702) }
{ system(1976) rule(880) can(841) }
{ problem(2511) optim(1539) algorithm(950) }
{ research(1085) discuss(1038) issu(1018) }
{ spatial(1525) area(1432) region(1030) }
{ method(2212) result(1239) propos(1039) }
{ imag(1947) propos(1133) code(1026) }
{ clinic(1479) use(1117) guidelin(835) }
{ case(1353) use(1143) diagnosi(1136) }
{ howev(809) still(633) remain(590) }
{ use(2086) technolog(871) perceiv(783) }
{ analysi(2126) use(1163) compon(1037) }
{ method(1219) similar(1157) match(930) }
{ network(2748) neural(1063) input(814) }
{ patient(2315) diseas(1263) diabet(1191) }
{ framework(1458) process(801) describ(734) }
{ error(1145) method(1030) estim(1020) }
{ chang(1828) time(1643) increas(1301) }
{ featur(1941) imag(1645) propos(1176) }
{ data(3963) clinic(1234) research(1004) }
{ perform(999) metric(946) measur(919) }
{ patient(1821) servic(1111) care(1106) }
{ estim(2440) model(1874) function(577) }
{ model(3404) distribut(989) bayesian(671) }
{ data(1737) use(1416) pattern(1282) }
{ inform(2794) health(2639) internet(1427) }
{ imag(1057) registr(996) error(939) }
{ bind(1733) structur(1185) ligand(1036) }
{ sequenc(1873) structur(1644) protein(1328) }
{ featur(3375) classif(2383) classifi(1994) }
{ imag(2830) propos(1344) filter(1198) }
{ imag(2675) segment(2577) method(1081) }
{ take(945) account(800) differ(722) }
{ studi(2440) review(1878) systemat(933) }
{ motion(1329) object(1292) video(1091) }
{ treatment(1704) effect(941) patient(846) }
{ surgeri(1148) surgic(1085) robot(1054) }
{ concept(1167) ontolog(924) domain(897) }
{ algorithm(1844) comput(1787) effici(935) }
{ extract(1171) text(1153) clinic(932) }
{ method(1557) propos(1049) approach(1037) }
{ data(1714) softwar(1251) tool(1186) }
{ design(1359) user(1324) use(1319) }
{ control(1307) perform(991) simul(935) }
{ model(2220) cell(1177) simul(1124) }
{ care(1570) inform(1187) nurs(1089) }
{ method(984) reconstruct(947) comput(926) }
{ search(2224) databas(1162) retriev(909) }
{ risk(3053) factor(974) diseas(938) }
{ system(1050) medic(1026) inform(1018) }
{ import(1318) role(1303) understand(862) }
{ model(2341) predict(2261) use(1141) }
{ visual(1396) interact(850) tool(830) }
{ compound(1573) activ(1297) structur(1058) }
{ perform(1367) use(1326) method(1137) }
{ studi(1119) effect(1106) posit(819) }
{ blood(1257) pressur(1144) flow(957) }
{ record(1888) medic(1808) patient(1693) }
{ health(3367) inform(1360) care(1135) }
{ monitor(1329) mobil(1314) devic(1160) }
{ ehr(2073) health(1662) electron(1139) }
{ state(1844) use(1261) util(961) }
{ research(1218) medic(880) student(794) }
{ patient(2837) hospit(1953) medic(668) }
{ data(2317) use(1299) case(1017) }
{ age(1611) year(1155) adult(843) }
{ medic(1828) order(1363) alert(1069) }
{ signal(2180) analysi(812) frequenc(800) }
{ cost(1906) reduc(1198) effect(832) }
{ group(2977) signific(1463) compar(1072) }
{ sampl(1606) size(1419) use(1276) }
{ gene(2352) biolog(1181) express(1162) }
{ data(3008) multipl(1320) sourc(1022) }
{ intervent(3218) particip(2042) group(1664) }
{ activ(1138) subject(705) human(624) }
{ time(1939) patient(1703) rate(768) }
{ can(981) present(881) function(850) }
{ health(1844) social(1437) communiti(874) }
{ structur(1116) can(940) graph(676) }
{ cancer(2502) breast(956) screen(824) }
{ use(976) code(926) identifi(902) }
{ use(1733) differ(960) four(931) }
{ result(1111) use(1088) new(759) }
{ implement(1333) system(1263) develop(1122) }
{ survey(1388) particip(1329) question(1065) }
{ decis(3086) make(1611) patient(1517) }
{ process(1125) use(805) approach(778) }
{ activ(1452) weight(1219) physic(1104) }
{ method(1969) cluster(1462) data(1082) }
{ detect(2391) sensit(1101) algorithm(908) }


The evaluation of regression QSAR model performance, in fitting, robustness, and external prediction, is of pivotal importance. Over the past decade, different external validation parameters have been proposed: Q(F1)(2), Q(F2)(2), Q(F3)(2), r(m)(2), and the Golbraikh-Tropsha method. Recently, the concordance correlation coefficient (CCC, Lin), which simply verifies how small the differences are between experimental data and external data set predictions, independently of their range, was proposed by our group as an external validation parameter for use in QSAR studies. In our preliminary work, we demonstrated with thousands of simulated models that CCC is in good agreement with the compared validation criteria (except r(m)(2)) using the cutoff values normally applied for the acceptance of QSAR models as externally predictive. In this new work, we have studied and compared the general trends of the various criteria relative to different possible biases (scale and location shifts) in external data distributions, using a wide range of different simulated scenarios. This study, further supported by visual inspection of experimental vs predicted data scatter plots, has highlighted problems related to some criteria. Indeed, if based on the cutoff suggested by the proponent, r(m)(2) could also accept not predictive models in two of the possible biases (location, location plus scale), while in the case of scale shift bias, it appears to be the most restrictive. Moreover, Q(F1)(2) and Q(F2)(2) showed some problems in one of the possible biases (scale shift). This analysis allowed us to also propose recalibrated, and intercomparable for the same data scatter, new thresholds for each criterion in defining a QSAR model as really externally predictive in a more precautionary approach. An analysis of the results revealed that the scatter plot of experimental vs predicted external data must always be evaluated to support the statistical criteria values: in some cases high statistical parameter values could hide models with unacceptable predictions.

Resumo Limpo

evalu regress qsar model perform fit robust extern predict pivot import past decad differ extern valid paramet propos qf qf qf rm golbraikhtropsha method recent concord correl coeffici ccc lin simpli verifi small differ experiment data extern data set predict independ rang propos group extern valid paramet use qsar studi preliminari work demonstr thousand simul model ccc good agreement compar valid criteria except rm use cutoff valu normal appli accept qsar model extern predict new work studi compar general trend various criteria relat differ possibl bias scale locat shift extern data distribut use wide rang differ simul scenario studi support visual inspect experiment vs predict data scatter plot highlight problem relat criteria inde base cutoff suggest propon rm also accept predict model two possibl bias locat locat plus scale case scale shift bias appear restrict moreov qf qf show problem one possibl bias scale shift analysi allow us also propos recalibr intercompar data scatter new threshold criterion defin qsar model realli extern predict precautionari approach analysi result reveal scatter plot experiment vs predict extern data must alway evalu support statist criteria valu case high statist paramet valu hide model unaccept predict

Resumos Similares

J Chem Inf Model - Comparative studies on some metrics for external validation of QSPR models. ( 0,783475796226387 )
Int J Health Geogr - Incorporating geographical factors with artificial neural networks to predict reference values of erythrocyte sedimentation rate. ( 0,708397989783764 )
BMC Med Inform Decis Mak - Concordance and predictive value of two adverse drug event data sets. ( 0,70490930600172 )
J Chem Inf Model - iLOGP: a simple, robust, and efficient description of n-octanol/water partition coefficient for drug design using the GB/SA approach. ( 0,699831618824239 )
Neural Comput - Molecular diffusion model of neurotransmitter homeostasis around synapses supporting gradients. ( 0,69151801951024 )
J Chem Inf Model - Real external predictivity of QSAR models: how to evaluate it? Comparison of different validation criteria and proposal of using the concordance correlation coefficient. ( 0,680586449641865 )
J Chem Inf Model - Time-split cross-validation as a method for estimating the goodness of prospective prediction. ( 0,679504477878997 )
J Chem Inf Model - Study of chromatographic retention of natural terpenoids by chemoinformatic tools. ( 0,676850741459108 )
Int J Health Geogr - Comparative analysis of remotely-sensed data products via ecological niche modeling of avian influenza case occurrences in Middle Eastern poultry. ( 0,675340736158566 )
J Chem Inf Model - Three useful dimensions for domain applicability in QSAR models using random forest. ( 0,672929079427563 )
J Chem Inf Model - GRID-based three-dimensional pharmacophores II: PharmBench, a benchmark data set for evaluating pharmacophore elucidation methods. ( 0,668091685335566 )
AMIA Annu Symp Proc - Effect of data combination on predictive modeling: a study using gene expression data. ( 0,66802337762904 )
Artif Intell Med - Training artificial neural networks directly on the concordance index for censored data using genetic algorithms. ( 0,66180283123542 )
Comput Methods Programs Biomed - A therapy parameter-based model for predicting blood glucose concentrations in patients with type 1 diabetes. ( 0,656500891024231 )
J Chem Inf Model - Beyond the scope of Free-Wilson analysis: building interpretable QSAR models with machine learning algorithms. ( 0,639529771014122 )
J Chem Inf Model - Does rational selection of training and test sets improve the outcome of QSAR modeling? ( 0,637384829649579 )
AMIA Annu Symp Proc - Predicting the dengue incidence in Singapore using univariate time series models. ( 0,631779791377933 )
Comput Methods Programs Biomed - Kinetic modelling of haemodialysis removal of myoglobin in rhabdomyolysis patients. ( 0,628593253076342 )
Comput Methods Programs Biomed - Interstitial insulin kinetic parameters for a 2-compartment insulin model with saturable clearance. ( 0,622009444685891 )
Med Biol Eng Comput - Application of the RIMARC algorithm to a large data set of action potentials and clinical parameters for risk prediction of atrial fibrillation. ( 0,620824648630997 )
Comput. Aided Surg. - Evaluation of a computational model to predict elbow range of motion. ( 0,619115913635133 )
AMIA Annu Symp Proc - Motivating the additional use of external validity: examining transportability in a model of glioblastoma multiforme. ( 0,617929976822666 )
J Chem Inf Model - RS-Predictor models augmented with SMARTCyp reactivities: robust metabolic regioselectivity predictions for nine CYP isozymes. ( 0,615394695435974 )
Methods Inf Med - Prediction model for glucose metabolism based on lipid metabolism. ( 0,614222405466486 )
J Chem Inf Model - Prediction of linear cationic antimicrobial peptides based on characteristics responsible for their interaction with the membranes. ( 0,605018805310607 )
J Chem Inf Model - Predicting pK(a) values of substituted phenols from atomic charges: comparison of different quantum mechanical methods and charge distribution schemes. ( 0,602953869414244 )
J Biomed Inform - MysiRNA: improving siRNA efficacy prediction using a machine-learning model combining multi-tools and whole stacking energy (G). ( 0,601322403256202 )
J Chem Inf Model - Pharmacophore assessment through 3-D QSAR: evaluation of the predictive ability on new derivatives by the application on a series of antitubercular agents. ( 0,597874280606161 )
Comput Methods Programs Biomed - A 5-component mathematical model for salt-induced hypertension in Dahl-S and Dahl-R rats. ( 0,597504403342951 )
Int J Comput Assist Radiol Surg - Assessing performance in brain tumor resection using a novel virtual reality simulator. ( 0,59182733014512 )
Med Decis Making - Prediction of health preference values from CD4 counts in individuals with HIV. ( 0,589412967845765 )
J Chem Inf Model - Combined 3D-QSAR, molecular docking, and molecular dynamics study on piperazinyl-glutamate-pyridines/pyrimidines as potent P2Y12 antagonists for inhibition of platelet aggregation. ( 0,587605870740813 )
J Chem Inf Model - Best of both worlds: combining pharma data and state of the art modeling technology to improve in Silico pKa prediction. ( 0,586015738790755 )
Artif Intell Med - Fuzzy model identification of dengue epidemic in Colombia based on multiresolution analysis. ( 0,584738951860554 )
J Chem Inf Model - Leave-cluster-out cross-validation is appropriate for scoring functions derived from diverse protein data sets. ( 0,582632865310674 )
J Chem Inf Model - Design of novel FLT-3 inhibitors based on dual-layer 3D-QSAR model and fragment-based compounds in silico. ( 0,582487301102291 )
AMIA Annu Symp Proc - Advanced proficiency EHR training: effect on physicians' EHR efficiency, EHR satisfaction and job satisfaction. ( 0,578697531358086 )
J Chem Inf Model - Rank order entropy: why one metric is not enough. ( 0,578290781958801 )
J. Comput. Biol. - Rich parameterization improves RNA structure prediction. ( 0,576662985592989 )
Artif Intell Med - A machine learning-based approach to prognostic analysis of thoracic transplantations. ( 0,574205406863374 )
Med Decis Making - Developing a tuberculosis transmission model that accounts for changes in population health. ( 0,571990019685081 )
BMC Med Inform Decis Mak - Measuring preferences for analgesic treatment for cancer pain: how do African-Americans and Whites perform on choice-based conjoint (CBC) analysis experiments? ( 0,571236991466359 )
J Chem Inf Model - Combined receptor and ligand-based approach to the universal pharmacophore model development for studies of drug blockade to the hERG1 pore domain. ( 0,570640359125231 )
J. Comput. Biol. - The complexity of the dirichlet model for multiple alignment data. ( 0,566101318900955 )
Comput. Biol. Med. - A prediction model of substrates and non-substrates of breast cancer resistance protein (BCRP) developed by GA-CG-SVM method. ( 0,566024208821055 )
J Chem Inf Model - A new approach to radial basis function approximation and its application to QSAR. ( 0,565148854834819 )
J Chem Inf Model - Applicability Domain ANalysis (ADAN): a robust method for assessing the reliability of drug property predictions. ( 0,564930707228086 )
J Chem Inf Model - CSAR data set release 2012: ligands, affinities, complexes, and docking decoys. ( 0,564374448853346 )
Med Biol Eng Comput - Accelerometry-based prediction of movement dynamics for balance monitoring. ( 0,559932920918191 )
J Chem Inf Model - Design and synthesis of new antioxidants predicted by the model developed on a set of pulvinic acid derivatives. ( 0,555266898094637 )
J Chem Inf Model - In silico prediction of total human plasma clearance. ( 0,554741106804562 )
Comput Methods Programs Biomed - Modelling the Double Peak Phenomenon in pharmacokinetics. ( 0,553874459107238 )
J Chem Inf Model - Criterion for evaluating the predictive ability of nonlinear regression models without cross-validation. ( 0,551725541798714 )
Comput Math Methods Med - Multiscale autoregressive identification of neuroelectrophysiological systems. ( 0,549961347068141 )
J Chem Inf Model - Applicability domain based on ensemble learning in classification and regression analyses. ( 0,547950236774517 )
Comput Methods Programs Biomed - Modelling of tumour growth and cytotoxic effect of docetaxel in xenografts. ( 0,547565278949876 )
J Chem Inf Model - Classification of compounds with distinct or overlapping multi-target activities and diverse molecular mechanisms using emerging chemical patterns. ( 0,545365019413373 )
BMC Med Inform Decis Mak - Influence of data quality on computed Dutch hospital quality indicators: a case study in colorectal cancer surgery. ( 0,544999467774217 )
J Chem Inf Model - Impact of template choice on homology model efficiency in virtual screening. ( 0,54373748118765 )
BMC Med Inform Decis Mak - Regression tree construction by bootstrap: model search for DRG-systems applied to Austrian health-data. ( 0,540944866489599 )
BMC Med Inform Decis Mak - Is it possible to identify cases of coronary artery bypass graft postoperative surgical site infection accurately from claims data? ( 0,540915118266287 )
J Am Med Inform Assoc - Choosing blindly but wisely: differentially private solicitation of DNA datasets for disease marker discovery. ( 0,540573279965352 )
J Chem Inf Model - Building a three-dimensional model of CYP2C9 inhibition using the Autocorrelator: an autonomous model generator. ( 0,53885287385774 )
AMIA Annu Symp Proc - Identifying Deviations from Usual Medical Care using a Statistical Approach. ( 0,537030725556188 )
J Biomed Inform - Markov blanket-based approach for learning multi-dimensional Bayesian network classifiers: an application to predict the European Quality of Life-5 Dimensions (EQ-5D) from the 39-item Parkinson's Disease Questionnaire (PDQ-39). ( 0,536021877096849 )
Comput Methods Programs Biomed - Modeling the glucose regulatory system in extreme preterm infants. ( 0,532164234070003 )
Spat Spatiotemporal Epidemiol - Spatial approximations of network-based individual level infectious disease models. ( 0,531842442719553 )
Comput. Biol. Med. - On a reusable and multilevel methodology for modeling and simulation of pharmacokinetic-physiological systems: a preliminary study. ( 0,529265200851225 )
J Chem Inf Model - Coping with unbalanced class data sets in oral absorption models. ( 0,52835683440047 )
J. Med. Internet Res. - A case study of the New York City 2012-2013 influenza season with daily geocoded Twitter data from temporal and spatiotemporal perspectives. ( 0,526344078563442 )
Comput Math Methods Med - Global stability analysis of SEIR model with holling type II incidence function. ( 0,526313986225762 )
J Chem Inf Model - In silico prediction of aqueous solubility using simple QSPR models: the importance of phenol and phenol-like moieties. ( 0,52609216997448 )
AMIA Annu Symp Proc - Ontology-based federated data access to human studies information. ( 0,524537911117619 )
Med Biol Eng Comput - Use of a comprehensive numerical model to improve biventricular pacemaker temporization in patients affected by heart failure undergoing to CRT-D therapy. ( 0,523216740729905 )
Int J Neural Syst - A longitudinal EEG study of Alzheimer's disease progression based on a complex network approach. ( 0,519811012168935 )
J Chem Inf Model - Quantitative structure-activity relationship models for ready biodegradability of chemicals. ( 0,519085581173914 )
Med Biol Eng Comput - Power type strain energy function model and prediction of the anisotropic mechanical properties of skin using uniaxial extension data. ( 0,517704627348789 )
J Am Med Inform Assoc - Harvest: an open platform for developing web-based biomedical data discovery and reporting applications. ( 0,513654007359512 )
Wiley Interdiscip Rev Syst Biol Med - Mechanistic modeling to investigate signaling by oncogenic Ras mutants. ( 0,513622565250869 )
Med Biol Eng Comput - CSF dynamic analysis of a predictive pulsatility-based infusion test for normal pressure hydrocephalus. ( 0,511221052244372 )
IEEE Trans Pattern Anal Mach Intell - Learning Pullback HMM Distances. ( 0,509927835567604 )
J. Comput. Biol. - Boolean models can explain bistability in the lac operon. ( 0,50992433536049 )
J Biomed Inform - Quantifying the costs and benefits of privacy-preserving health data publishing. ( 0,506738669395479 )
J Chem Inf Model - In silico prediction of chemical Ames mutagenicity. ( 0,505445128880331 )
J Chem Inf Model - Statistical analysis and compound selection of combinatorial libraries for soluble epoxide hydrolase. ( 0,505004093599274 )
Med Biol Eng Comput - Development of a comprehensive musculoskeletal model of the shoulder and elbow. ( 0,50429320828808 )
Comput Methods Programs Biomed - A predictive model of longitudinal, patient-specific colonoscopy results. ( 0,503801095736119 )
Spat Spatiotemporal Epidemiol - Spatial modelling of disease using data- and knowledge-driven approaches. ( 0,503514664029758 )
Comput Math Methods Med - Modeling innate immune response to early Mycobacterium infection. ( 0,502592360174781 )
IEEE Trans Image Process - Incremental N-mode SVD for large-scale multilinear generative models. ( 0,50222032386827 )
IEEE Trans Vis Comput Graph - Model Synthesis: A General Procedural Modeling Algorithm. ( 0,501018839839877 )
J Chem Inf Model - Template CoMFA: the 3D-QSAR Grail? ( 0,500183044179143 )
IEEE J Biomed Health Inform - Prediction of Hemodynamic Response to Epinephrine via Model-Based System Identification. ( 0,499584268366075 )
J Chem Inf Model - Four-dimensional structure-activity relationship model to predict HIV-1 integrase strand transfer inhibition using LQTA-QSAR methodology. ( 0,498112795157932 )
J Chem Inf Model - Optimizing predictive performance of CASE Ultra expert system models using the applicability domains of individual toxicity alerts. ( 0,497307271632338 )
J Chem Inf Model - Development of novel 3D-QSAR combination approach for screening and optimizing B-Raf inhibitors in silico. ( 0,496733832891951 )
Comput. Biol. Med. - Quantification of contributions of molecular fragments for eye irritation of organic chemicals using QSAR study. ( 0,495659879713106 )
J Chem Inf Model - Robust scoring functions for protein-ligand interactions with quantum chemical charge models. ( 0,493924274532787 )
Comput. Biol. Med. - Complex activity patterns in arterial wall: results from a model of calcium dynamics. ( 0,493428584311054 )
J Chem Inf Model - Similarity coefficients for binary chemoinformatics data: overview and extended comparison using simulated and real data sets. ( 0,492681192825051 )