J Chem Inf Model - Does rational selection of training and test sets improve the outcome of QSAR modeling?

Tópicos

{ model(2656) set(1616) predict(1553) }
{ method(1969) cluster(1462) data(1082) }
{ state(1844) use(1261) util(961) }
{ model(2341) predict(2261) use(1141) }
{ assess(1506) score(1403) qualiti(1306) }
{ group(2977) signific(1463) compar(1072) }
{ data(3008) multipl(1320) sourc(1022) }
{ perform(1367) use(1326) method(1137) }
{ compound(1573) activ(1297) structur(1058) }
{ system(1976) rule(880) can(841) }
{ learn(2355) train(1041) set(1003) }
{ model(3404) distribut(989) bayesian(671) }
{ imag(1947) propos(1133) code(1026) }
{ data(1737) use(1416) pattern(1282) }
{ measur(2081) correl(1212) valu(896) }
{ sequenc(1873) structur(1644) protein(1328) }
{ method(1219) similar(1157) match(930) }
{ patient(2315) diseas(1263) diabet(1191) }
{ clinic(1479) use(1117) guidelin(835) }
{ method(984) reconstruct(947) comput(926) }
{ featur(1941) imag(1645) propos(1176) }
{ perform(999) metric(946) measur(919) }
{ studi(1119) effect(1106) posit(819) }
{ monitor(1329) mobil(1314) devic(1160) }
{ analysi(2126) use(1163) compon(1037) }
{ implement(1333) system(1263) develop(1122) }
{ can(774) often(719) complex(702) }
{ inform(2794) health(2639) internet(1427) }
{ imag(1057) registr(996) error(939) }
{ bind(1733) structur(1185) ligand(1036) }
{ featur(3375) classif(2383) classifi(1994) }
{ imag(2830) propos(1344) filter(1198) }
{ network(2748) neural(1063) input(814) }
{ imag(2675) segment(2577) method(1081) }
{ take(945) account(800) differ(722) }
{ studi(2440) review(1878) systemat(933) }
{ motion(1329) object(1292) video(1091) }
{ treatment(1704) effect(941) patient(846) }
{ surgeri(1148) surgic(1085) robot(1054) }
{ framework(1458) process(801) describ(734) }
{ problem(2511) optim(1539) algorithm(950) }
{ error(1145) method(1030) estim(1020) }
{ chang(1828) time(1643) increas(1301) }
{ concept(1167) ontolog(924) domain(897) }
{ algorithm(1844) comput(1787) effici(935) }
{ extract(1171) text(1153) clinic(932) }
{ method(1557) propos(1049) approach(1037) }
{ data(1714) softwar(1251) tool(1186) }
{ design(1359) user(1324) use(1319) }
{ control(1307) perform(991) simul(935) }
{ model(2220) cell(1177) simul(1124) }
{ care(1570) inform(1187) nurs(1089) }
{ general(901) number(790) one(736) }
{ search(2224) databas(1162) retriev(909) }
{ case(1353) use(1143) diagnosi(1136) }
{ howev(809) still(633) remain(590) }
{ data(3963) clinic(1234) research(1004) }
{ studi(1410) differ(1259) use(1210) }
{ risk(3053) factor(974) diseas(938) }
{ research(1085) discuss(1038) issu(1018) }
{ system(1050) medic(1026) inform(1018) }
{ import(1318) role(1303) understand(862) }
{ visual(1396) interact(850) tool(830) }
{ blood(1257) pressur(1144) flow(957) }
{ spatial(1525) area(1432) region(1030) }
{ record(1888) medic(1808) patient(1693) }
{ health(3367) inform(1360) care(1135) }
{ model(3480) simul(1196) paramet(876) }
{ ehr(2073) health(1662) electron(1139) }
{ research(1218) medic(880) student(794) }
{ patient(2837) hospit(1953) medic(668) }
{ data(2317) use(1299) case(1017) }
{ age(1611) year(1155) adult(843) }
{ medic(1828) order(1363) alert(1069) }
{ signal(2180) analysi(812) frequenc(800) }
{ cost(1906) reduc(1198) effect(832) }
{ sampl(1606) size(1419) use(1276) }
{ gene(2352) biolog(1181) express(1162) }
{ first(2504) two(1366) second(1323) }
{ intervent(3218) particip(2042) group(1664) }
{ activ(1138) subject(705) human(624) }
{ time(1939) patient(1703) rate(768) }
{ patient(1821) servic(1111) care(1106) }
{ use(2086) technolog(871) perceiv(783) }
{ can(981) present(881) function(850) }
{ health(1844) social(1437) communiti(874) }
{ structur(1116) can(940) graph(676) }
{ high(1669) rate(1365) level(1280) }
{ cancer(2502) breast(956) screen(824) }
{ use(976) code(926) identifi(902) }
{ use(1733) differ(960) four(931) }
{ drug(1928) target(777) effect(648) }
{ result(1111) use(1088) new(759) }
{ survey(1388) particip(1329) question(1065) }
{ estim(2440) model(1874) function(577) }
{ decis(3086) make(1611) patient(1517) }
{ process(1125) use(805) approach(778) }
{ activ(1452) weight(1219) physic(1104) }
{ method(2212) result(1239) propos(1039) }
{ detect(2391) sensit(1101) algorithm(908) }

Resumo

Prior to using a quantitative structure activity relationship (QSAR) model for external predictions, its predictive power should be established and validated. In the absence of a true external data set, the best way to validate the predictive ability of a model is to perform its statistical external validation. In statistical external validation, the overall data set is divided into training and test sets. Commonly, this splitting is performed using random division. Rational splitting methods can divide data sets into training and test sets in an intelligent fashion. The purpose of this study was to determine whether rational division methods lead to more predictive models compared to random division. A special data splitting procedure was used to facilitate the comparison between random and rational division methods. For each toxicity end point, the overall data set was divided into a modeling set (80% of the overall set) and an external evaluation set (20% of the overall set) using random division. The modeling set was then subdivided into a training set (80% of the modeling set) and a test set (20% of the modeling set) using rational division methods and by using random division. The Kennard-Stone, minimal test set dissimilarity, and sphere exclusion algorithms were used as the rational division methods. The hierarchical clustering, random forest, and k-nearest neighbor (kNN) methods were used to develop QSAR models based on the training sets. For kNN QSAR, multiple training and test sets were generated, and multiple QSAR models were built. The results of this study indicate that models based on rational division methods generate better statistical results for the test sets than models based on random division, but the predictive power of both types of models are comparable.

Resumo Limpo

prior use quantit structur activ relationship qsar model extern predict predict power establish valid absenc true extern data set best way valid predict abil model perform statist extern valid statist extern valid overal data set divid train test set common split perform use random divis ration split method can divid data set train test set intellig fashion purpos studi determin whether ration divis method lead predict model compar random divis special data split procedur use facilit comparison random ration divis method toxic end point overal data set divid model set overal set extern evalu set overal set use random divis model set subdivid train set model set test set model set use ration divis method use random divis kennardston minim test set dissimilar sphere exclus algorithm use ration divis method hierarch cluster random forest knearest neighbor knn method use develop qsar model base train set knn qsar multipl train test set generat multipl qsar model built result studi indic model base ration divis method generat better statist result test set model base random divis predict power type model compar

Resumos Similares

J Chem Inf Model - Time-split cross-validation as a method for estimating the goodness of prospective prediction. ( 0,899596257147462 )
Artif Intell Med - Training artificial neural networks directly on the concordance index for censored data using genetic algorithms. ( 0,878191954798276 )
AMIA Annu Symp Proc - Effect of data combination on predictive modeling: a study using gene expression data. ( 0,850117899226883 )
J Chem Inf Model - Study of chromatographic retention of natural terpenoids by chemoinformatic tools. ( 0,847561707491845 )
J Chem Inf Model - Beyond the scope of Free-Wilson analysis: building interpretable QSAR models with machine learning algorithms. ( 0,798620009510963 )
J Chem Inf Model - RS-Predictor models augmented with SMARTCyp reactivities: robust metabolic regioselectivity predictions for nine CYP isozymes. ( 0,797431586427238 )
J Chem Inf Model - iLOGP: a simple, robust, and efficient description of n-octanol/water partition coefficient for drug design using the GB/SA approach. ( 0,795821545995318 )
J Chem Inf Model - GRID-based three-dimensional pharmacophores II: PharmBench, a benchmark data set for evaluating pharmacophore elucidation methods. ( 0,792825092282565 )
AMIA Annu Symp Proc - Motivating the additional use of external validity: examining transportability in a model of glioblastoma multiforme. ( 0,789215145319326 )
BMC Med Inform Decis Mak - Concordance and predictive value of two adverse drug event data sets. ( 0,785269329259812 )
J Chem Inf Model - Predicting pK(a) values of substituted phenols from atomic charges: comparison of different quantum mechanical methods and charge distribution schemes. ( 0,771199602437797 )
AMIA Annu Symp Proc - Advanced proficiency EHR training: effect on physicians' EHR efficiency, EHR satisfaction and job satisfaction. ( 0,75660132303771 )
J Chem Inf Model - Comparative studies on some metrics for external validation of QSPR models. ( 0,752581721369186 )
J Chem Inf Model - Pharmacophore assessment through 3-D QSAR: evaluation of the predictive ability on new derivatives by the application on a series of antitubercular agents. ( 0,743856329461145 )
Int J Health Geogr - Incorporating geographical factors with artificial neural networks to predict reference values of erythrocyte sedimentation rate. ( 0,741430917591749 )
BMC Med Inform Decis Mak - Regression tree construction by bootstrap: model search for DRG-systems applied to Austrian health-data. ( 0,740353689515533 )
J Chem Inf Model - Rank order entropy: why one metric is not enough. ( 0,738762600708751 )
AMIA Annu Symp Proc - Predicting the dengue incidence in Singapore using univariate time series models. ( 0,734330318841774 )
J Chem Inf Model - Three useful dimensions for domain applicability in QSAR models using random forest. ( 0,721851501135653 )
J Chem Inf Model - Statistical analysis and compound selection of combinatorial libraries for soluble epoxide hydrolase. ( 0,710638501152784 )
BMC Med Inform Decis Mak - Measuring preferences for analgesic treatment for cancer pain: how do African-Americans and Whites perform on choice-based conjoint (CBC) analysis experiments? ( 0,705896998792239 )
Comput. Aided Surg. - Evaluation of a computational model to predict elbow range of motion. ( 0,700681335958255 )
J. Comput. Biol. - The complexity of the dirichlet model for multiple alignment data. ( 0,696089588018391 )
J Chem Inf Model - Impact of template choice on homology model efficiency in virtual screening. ( 0,693432366831634 )
J Chem Inf Model - In silico prediction of total human plasma clearance. ( 0,691618425230229 )
Int J Comput Assist Radiol Surg - Assessing performance in brain tumor resection using a novel virtual reality simulator. ( 0,691228197655122 )
Comput. Biol. Med. - Artificial neural network modelling of the results of tympanoplasty in chronic suppurative otitis media patients. ( 0,686046434921785 )
J Chem Inf Model - Development of novel 3D-QSAR combination approach for screening and optimizing B-Raf inhibitors in silico. ( 0,684985603269335 )
Artif Intell Med - Fuzzy model identification of dengue epidemic in Colombia based on multiresolution analysis. ( 0,683109010145606 )
J Chem Inf Model - Estimation of carcinogenicity using molecular fragments tree. ( 0,679006078936851 )
J Chem Inf Model - In silico prediction of aqueous solubility using simple QSPR models: the importance of phenol and phenol-like moieties. ( 0,677826028691733 )
J Am Med Inform Assoc - Harvest: an open platform for developing web-based biomedical data discovery and reporting applications. ( 0,6748437030215 )
Int J Health Geogr - Comparative analysis of remotely-sensed data products via ecological niche modeling of avian influenza case occurrences in Middle Eastern poultry. ( 0,672380822875434 )
J Am Med Inform Assoc - Choosing blindly but wisely: differentially private solicitation of DNA datasets for disease marker discovery. ( 0,67069642241569 )
J Biomed Inform - MysiRNA: improving siRNA efficacy prediction using a machine-learning model combining multi-tools and whole stacking energy (G). ( 0,669646308584271 )
J Chem Inf Model - Leave-cluster-out cross-validation is appropriate for scoring functions derived from diverse protein data sets. ( 0,66848761914136 )
J Chem Inf Model - Best of both worlds: combining pharma data and state of the art modeling technology to improve in Silico pKa prediction. ( 0,66316586819265 )
Med Decis Making - Developing a tuberculosis transmission model that accounts for changes in population health. ( 0,661584027653245 )
J Chem Inf Model - Classification of compounds with distinct or overlapping multi-target activities and diverse molecular mechanisms using emerging chemical patterns. ( 0,65692733171932 )
Med Biol Eng Comput - Application of the RIMARC algorithm to a large data set of action potentials and clinical parameters for risk prediction of atrial fibrillation. ( 0,653761096215518 )
J. Med. Internet Res. - A case study of the New York City 2012-2013 influenza season with daily geocoded Twitter data from temporal and spatiotemporal perspectives. ( 0,653155468973813 )
Med Biol Eng Comput - Validating motor unit firing patterns extracted by EMG signal decomposition. ( 0,653155021083543 )
Brief. Bioinformatics - Data construction for phosphorylation site prediction. ( 0,650450559433382 )
Comput Methods Programs Biomed - A predictive model of longitudinal, patient-specific colonoscopy results. ( 0,647616310865571 )
J Chem Inf Model - Criterion for evaluating the predictive ability of nonlinear regression models without cross-validation. ( 0,644891072780561 )
Spat Spatiotemporal Epidemiol - Spatial modelling of disease using data- and knowledge-driven approaches. ( 0,644126317579292 )
Comput. Biol. Med. - A prediction model of substrates and non-substrates of breast cancer resistance protein (BCRP) developed by GA-CG-SVM method. ( 0,643933569147651 )
J Chem Inf Model - Prediction of linear cationic antimicrobial peptides based on characteristics responsible for their interaction with the membranes. ( 0,642238968226974 )
J Chem Inf Model - Real external predictivity of QSAR models. Part 2. New intercomparable thresholds for different validation criteria and the need for scatter plot inspection. ( 0,637384829649579 )
Med Biol Eng Comput - Optimal design of clinical tests for the identification of physiological models of type 1 diabetes in the presence of model mismatch. ( 0,632708975990105 )
Comput Methods Programs Biomed - Kinetic modelling of haemodialysis removal of myoglobin in rhabdomyolysis patients. ( 0,631508772935947 )
J Chem Inf Model - Applicability domain based on ensemble learning in classification and regression analyses. ( 0,630873465337114 )
IEEE Trans Image Process - Neighborhood Supported Model Level Fuzzy Aggregation for Moving Object Segmentation. ( 0,630822330477585 )
J Chem Inf Model - Introducing conformal prediction in predictive modeling. A transparent and flexible alternative to applicability domain determination. ( 0,627184585328192 )
Artif Intell Med - A machine learning-based approach to prognostic analysis of thoracic transplantations. ( 0,627133257057545 )
J Chem Inf Model - Robust scoring functions for protein-ligand interactions with quantum chemical charge models. ( 0,625813424249697 )
Int J Med Inform - Design and implementation of I2Vote--an interactive image-based voting system using windows mobile devices. ( 0,621132752235921 )
J Chem Inf Model - Hsp90 inhibitors, part 1: definition of 3-D QSAutogrid/R models as a tool for virtual screening. ( 0,620711208370737 )
J Chem Inf Model - A new approach to radial basis function approximation and its application to QSAR. ( 0,61926214345273 )
Int J Comput Assist Radiol Surg - Hybrid image visualization tool for 3D integration of CT coronary anatomy and quantitative myocardial perfusion PET. ( 0,617441570978344 )
J Chem Inf Model - Binary classification of a large collection of environmental chemicals from estrogen receptor assays by quantitative structure-activity relationship and machine learning methods. ( 0,616892655752267 )
J Chem Inf Model - Oversampling to overcome overfitting: exploring the relationship between data set composition, molecular descriptors, and predictive modeling methods. ( 0,616719732949385 )
Comput Math Methods Med - Multiscale autoregressive identification of neuroelectrophysiological systems. ( 0,614017400804264 )
J Chem Inf Model - Four-dimensional structure-activity relationship model to predict HIV-1 integrase strand transfer inhibition using LQTA-QSAR methodology. ( 0,611829612299263 )
J Am Med Inform Assoc - Use of a support vector machine for categorizing free-text notes: assessment of accuracy across two institutions. ( 0,611571726449446 )
J Chem Inf Model - In silico prediction of chemical Ames mutagenicity. ( 0,611276568317295 )
J Chem Inf Model - Comparison of random forest and Pipeline Pilot Na?ve Bayes in prospective QSAR predictions. ( 0,610357676475417 )
J Chem Inf Model - CSAR data set release 2012: ligands, affinities, complexes, and docking decoys. ( 0,609550759856699 )
J Chem Inf Model - Experimental and computational prediction of glass transition temperature of drugs. ( 0,609429425431237 )
Comput Methods Programs Biomed - Bayesian bivariate generalized Lindley model for survival data with a cure fraction. ( 0,606434050485714 )
Brief. Bioinformatics - An empirical assessment of validation practices for molecular classifiers. ( 0,605642589816182 )
IEEE Trans Image Process - Incremental N-mode SVD for large-scale multilinear generative models. ( 0,601871418068789 )
J Chem Inf Model - Applicability Domain ANalysis (ADAN): a robust method for assessing the reliability of drug property predictions. ( 0,60013043765065 )
Med Decis Making - Prediction of health preference values from CD4 counts in individuals with HIV. ( 0,598707211326004 )
J Chem Inf Model - Combined 3D-QSAR, molecular docking, and molecular dynamics study on piperazinyl-glutamate-pyridines/pyrimidines as potent P2Y12 antagonists for inhibition of platelet aggregation. ( 0,596900446280051 )
J Chem Inf Model - Applicability domains for classification problems: Benchmarking of distance to models for Ames mutagenicity set. ( 0,59638613762505 )
Brief. Bioinformatics - Rediscovery rate estimation for assessing the validation of significant findings in high-throughput studies. ( 0,595408832237569 )
AMIA Annu Symp Proc - Identifying Deviations from Usual Medical Care using a Statistical Approach. ( 0,593865132882529 )
J Chem Inf Model - Real external predictivity of QSAR models: how to evaluate it? Comparison of different validation criteria and proposal of using the concordance correlation coefficient. ( 0,593013976041555 )
J Chem Inf Model - Optimizing predictive performance of CASE Ultra expert system models using the applicability domains of individual toxicity alerts. ( 0,592330292695412 )
J Chem Inf Model - Building a three-dimensional model of CYP2C9 inhibition using the Autocorrelator: an autonomous model generator. ( 0,58964119824186 )
Int J Health Geogr - A linear programming model for preserving privacy when disclosing patient spatial information for secondary purposes. ( 0,588850272223175 )
J Chem Inf Model - Quantitative structure-activity relationship models for ready biodegradability of chemicals. ( 0,587203856868004 )
J Chem Inf Model - Coping with unbalanced class data sets in oral absorption models. ( 0,585469020291503 )
J Chem Inf Model - Automated building of organometallic complexes from 3D fragments. ( 0,583973534143812 )
Comput Methods Programs Biomed - Predicting body fat percentage based on gender, age and BMI by using artificial neural networks. ( 0,583201651639509 )
J Med Syst - Utilization of electronic medical records to build a detection model for surveillance of healthcare-associated urinary tract infections. ( 0,577690012214013 )
J Chem Inf Model - Binary classification of aqueous solubility using support vector machines with reduction and recombination feature selection. ( 0,577188145929117 )
Med Decis Making - Constructing proper ROCs from ordinal response data using weighted power functions. ( 0,574139966177331 )
AMIA Annu Symp Proc - Order sets in computerized physician order entry systems: an analysis of seven sites. ( 0,573942577538842 )
Comput Methods Programs Biomed - A mobile application for cognitive screening of dementia. ( 0,573520429851925 )
AMIA Annu Symp Proc - Ontology-based federated data access to human studies information. ( 0,572952275585594 )
Comput Methods Programs Biomed - Interstitial insulin kinetic parameters for a 2-compartment insulin model with saturable clearance. ( 0,571511562520755 )
J Biomed Inform - Transfer learning based clinical concept extraction on data from multiple sources. ( 0,569580949631301 )
J Chem Inf Model - Design and synthesis of new antioxidants predicted by the model developed on a set of pulvinic acid derivatives. ( 0,567217218829881 )
J Chem Inf Model - Design of novel FLT-3 inhibitors based on dual-layer 3D-QSAR model and fragment-based compounds in silico. ( 0,566661120623739 )
J Chem Inf Model - How accurately can we predict the melting points of drug-like compounds? ( 0,56544555286802 )
J Chem Inf Model - Analysis and study of molecule data sets using snowflake diagrams of weighted maximum common subgraph trees. ( 0,565158822402247 )
Comput. Biol. Med. - Quantification of contributions of molecular fragments for eye irritation of organic chemicals using QSAR study. ( 0,564443063749259 )
J Chem Inf Model - Template CoMFA applied to 116 biological targets. ( 0,563956566291761 )