J Chem Inf Model - Rank order entropy: why one metric is not enough.

Tópicos

{ model(2656) set(1616) predict(1553) }
{ perform(999) metric(946) measur(919) }
{ design(1359) user(1324) use(1319) }
{ take(945) account(800) differ(722) }
{ can(774) often(719) complex(702) }
{ bind(1733) structur(1185) ligand(1036) }
{ concept(1167) ontolog(924) domain(897) }
{ howev(809) still(633) remain(590) }
{ visual(1396) interact(850) tool(830) }
{ chang(1828) time(1643) increas(1301) }
{ compound(1573) activ(1297) structur(1058) }
{ cost(1906) reduc(1198) effect(832) }
{ measur(2081) correl(1212) valu(896) }
{ assess(1506) score(1403) qualiti(1306) }
{ model(2220) cell(1177) simul(1124) }
{ research(1085) discuss(1038) issu(1018) }
{ import(1318) role(1303) understand(862) }
{ risk(3053) factor(974) diseas(938) }
{ system(1050) medic(1026) inform(1018) }
{ sampl(1606) size(1419) use(1276) }
{ time(1939) patient(1703) rate(768) }
{ process(1125) use(805) approach(778) }
{ activ(1452) weight(1219) physic(1104) }
{ framework(1458) process(801) describ(734) }
{ data(1714) softwar(1251) tool(1186) }
{ control(1307) perform(991) simul(935) }
{ state(1844) use(1261) util(961) }
{ age(1611) year(1155) adult(843) }
{ high(1669) rate(1365) level(1280) }
{ error(1145) method(1030) estim(1020) }
{ learn(2355) train(1041) set(1003) }
{ extract(1171) text(1153) clinic(932) }
{ featur(1941) imag(1645) propos(1176) }
{ model(2341) predict(2261) use(1141) }
{ studi(1119) effect(1106) posit(819) }
{ blood(1257) pressur(1144) flow(957) }
{ model(3480) simul(1196) paramet(876) }
{ group(2977) signific(1463) compar(1072) }
{ data(3008) multipl(1320) sourc(1022) }
{ intervent(3218) particip(2042) group(1664) }
{ analysi(2126) use(1163) compon(1037) }
{ use(1733) differ(960) four(931) }
{ drug(1928) target(777) effect(648) }
{ result(1111) use(1088) new(759) }
{ survey(1388) particip(1329) question(1065) }
{ decis(3086) make(1611) patient(1517) }
{ detect(2391) sensit(1101) algorithm(908) }
{ model(3404) distribut(989) bayesian(671) }
{ imag(1947) propos(1133) code(1026) }
{ data(1737) use(1416) pattern(1282) }
{ inform(2794) health(2639) internet(1427) }
{ system(1976) rule(880) can(841) }
{ imag(1057) registr(996) error(939) }
{ sequenc(1873) structur(1644) protein(1328) }
{ method(1219) similar(1157) match(930) }
{ featur(3375) classif(2383) classifi(1994) }
{ imag(2830) propos(1344) filter(1198) }
{ network(2748) neural(1063) input(814) }
{ imag(2675) segment(2577) method(1081) }
{ patient(2315) diseas(1263) diabet(1191) }
{ studi(2440) review(1878) systemat(933) }
{ motion(1329) object(1292) video(1091) }
{ treatment(1704) effect(941) patient(846) }
{ surgeri(1148) surgic(1085) robot(1054) }
{ problem(2511) optim(1539) algorithm(950) }
{ clinic(1479) use(1117) guidelin(835) }
{ algorithm(1844) comput(1787) effici(935) }
{ method(1557) propos(1049) approach(1037) }
{ care(1570) inform(1187) nurs(1089) }
{ general(901) number(790) one(736) }
{ method(984) reconstruct(947) comput(926) }
{ search(2224) databas(1162) retriev(909) }
{ case(1353) use(1143) diagnosi(1136) }
{ data(3963) clinic(1234) research(1004) }
{ studi(1410) differ(1259) use(1210) }
{ perform(1367) use(1326) method(1137) }
{ spatial(1525) area(1432) region(1030) }
{ record(1888) medic(1808) patient(1693) }
{ health(3367) inform(1360) care(1135) }
{ monitor(1329) mobil(1314) devic(1160) }
{ ehr(2073) health(1662) electron(1139) }
{ research(1218) medic(880) student(794) }
{ patient(2837) hospit(1953) medic(668) }
{ data(2317) use(1299) case(1017) }
{ medic(1828) order(1363) alert(1069) }
{ signal(2180) analysi(812) frequenc(800) }
{ gene(2352) biolog(1181) express(1162) }
{ first(2504) two(1366) second(1323) }
{ activ(1138) subject(705) human(624) }
{ patient(1821) servic(1111) care(1106) }
{ use(2086) technolog(871) perceiv(783) }
{ can(981) present(881) function(850) }
{ health(1844) social(1437) communiti(874) }
{ structur(1116) can(940) graph(676) }
{ cancer(2502) breast(956) screen(824) }
{ use(976) code(926) identifi(902) }
{ implement(1333) system(1263) develop(1122) }
{ estim(2440) model(1874) function(577) }
{ method(1969) cluster(1462) data(1082) }
{ method(2212) result(1239) propos(1039) }

Resumo

The use of Quantitative Structure-Activity Relationship models to address problems in drug discovery has a mixed history, generally resulting from the misapplication of QSAR models that were either poorly constructed or used outside of their domains of applicability. This situation has motivated the development of a variety of model performance metrics (r(2), PRESS r(2), F-tests, etc.) designed to increase user confidence in the validity of QSAR predictions. In a typical workflow scenario, QSAR models are created and validated on training sets of molecules using metrics such as Leave-One-Out or many-fold cross-validation methods that attempt to assess their internal consistency. However, few current validation methods are designed to directly address the stability of QSAR predictions in response to changes in the information content of the training set. Since the main purpose of QSAR is to quickly and accurately estimate a property of interest for an untested set of molecules, it makes sense to have a means at hand to correctly set user expectations of model performance. In fact, the numerical value of a molecular prediction is often less important to the end user than knowing the rank order of that set of molecules according to their predicted end point values. Consequently, a means for characterizing the stability of predicted rank order is an important component of predictive QSAR. Unfortunately, none of the many validation metrics currently available directly measure the stability of rank order prediction, making the development of an additional metric that can quantify model stability a high priority. To address this need, this work examines the stabilities of QSAR rank order models created from representative data sets, descriptor sets, and modeling methods that were then assessed using Kendall Tau as a rank order metric, upon which the Shannon entropy was evaluated as a means of quantifying rank-order stability. Random removal of data from the training set, also known as Data Truncation Analysis (DTA), was used as a means for systematically reducing the information content of each training set while examining both rank order performance and rank order stability in the face of training set data loss. The premise for DTA ROE model evaluation is that the response of a model to incremental loss of training information will be indicative of the quality and sufficiency of its training set, learning method, and descriptor types to cover a particular domain of applicability. This process is termed a "rank order entropy" evaluation or ROE. By analogy with information theory, an unstable rank order model displays a high level of implicit entropy, while a QSAR rank order model which remains nearly unchanged during training set reductions would show low entropy. In this work, the ROE metric was applied to 71 data sets of different sizes and was found to reveal more information about the behavior of the models than traditional metrics alone. Stable, or consistently performing models, did not necessarily predict rank order well. Models that performed well in rank order did not necessarily perform well in traditional metrics. In the end, it was shown that ROE metrics suggested that some QSAR models that are typically used should be discarded. ROE evaluation helps to discern which combinations of data set, descriptor set, and modeling methods lead to usable models in prioritization schemes and provides confidence in the use of a particular model within a specific domain of applicability.

Resumo Limpo

use quantit structureact relationship model address problem drug discoveri mix histori general result misappl qsar model either poor construct use outsid domain applic situat motiv develop varieti model perform metric r press r ftest etc design increas user confid valid qsar predict typic workflow scenario qsar model creat valid train set molecul use metric leaveoneout manyfold crossvalid method attempt assess intern consist howev current valid method design direct address stabil qsar predict respons chang inform content train set sinc main purpos qsar quick accur estim properti interest untest set molecul make sens mean hand correct set user expect model perform fact numer valu molecular predict often less import end user know rank order set molecul accord predict end point valu consequ mean character stabil predict rank order import compon predict qsar unfortun none mani valid metric current avail direct measur stabil rank order predict make develop addit metric can quantifi model stabil high prioriti address need work examin stabil qsar rank order model creat repres data set descriptor set model method assess use kendal tau rank order metric upon shannon entropi evalu mean quantifi rankord stabil random remov data train set also known data truncat analysi dta use mean systemat reduc inform content train set examin rank order perform rank order stabil face train set data loss premis dta roe model evalu respons model increment loss train inform will indic qualiti suffici train set learn method descriptor type cover particular domain applic process term rank order entropi evalu roe analog inform theori unstabl rank order model display high level implicit entropi qsar rank order model remain near unchang train set reduct show low entropi work roe metric appli data set differ size found reveal inform behavior model tradit metric alon stabl consist perform model necessarili predict rank order well model perform well rank order necessarili perform well tradit metric end shown roe metric suggest qsar model typic use discard roe evalu help discern combin data set descriptor set model method lead usabl model priorit scheme provid confid use particular model within specif domain applic

Resumos Similares

J Chem Inf Model - Applicability domains for classification problems: Benchmarking of distance to models for Ames mutagenicity set. ( 0,80465024723243 )
J Chem Inf Model - Time-split cross-validation as a method for estimating the goodness of prospective prediction. ( 0,789664793156456 )
Artif Intell Med - Training artificial neural networks directly on the concordance index for censored data using genetic algorithms. ( 0,766505837252999 )
J Chem Inf Model - GRID-based three-dimensional pharmacophores II: PharmBench, a benchmark data set for evaluating pharmacophore elucidation methods. ( 0,763304935902436 )
J Chem Inf Model - Pharmacophore assessment through 3-D QSAR: evaluation of the predictive ability on new derivatives by the application on a series of antitubercular agents. ( 0,749015940042101 )
J Chem Inf Model - Study of chromatographic retention of natural terpenoids by chemoinformatic tools. ( 0,746889876080331 )
AMIA Annu Symp Proc - Effect of data combination on predictive modeling: a study using gene expression data. ( 0,745222561704456 )
J Chem Inf Model - Does rational selection of training and test sets improve the outcome of QSAR modeling? ( 0,738762600708751 )
J Chem Inf Model - iLOGP: a simple, robust, and efficient description of n-octanol/water partition coefficient for drug design using the GB/SA approach. ( 0,726874433164585 )
J Chem Inf Model - Comparative studies on some metrics for external validation of QSPR models. ( 0,716963517696451 )
BMC Med Inform Decis Mak - Concordance and predictive value of two adverse drug event data sets. ( 0,710958712036357 )
J. Comput. Biol. - The complexity of the dirichlet model for multiple alignment data. ( 0,698879026183984 )
AMIA Annu Symp Proc - Advanced proficiency EHR training: effect on physicians' EHR efficiency, EHR satisfaction and job satisfaction. ( 0,695166663526872 )
BMC Med Inform Decis Mak - Measuring preferences for analgesic treatment for cancer pain: how do African-Americans and Whites perform on choice-based conjoint (CBC) analysis experiments? ( 0,684906782920278 )
AMIA Annu Symp Proc - Motivating the additional use of external validity: examining transportability in a model of glioblastoma multiforme. ( 0,684727085829193 )
J Chem Inf Model - RS-Predictor models augmented with SMARTCyp reactivities: robust metabolic regioselectivity predictions for nine CYP isozymes. ( 0,683866888639319 )
J Chem Inf Model - Beyond the scope of Free-Wilson analysis: building interpretable QSAR models with machine learning algorithms. ( 0,68300170914024 )
J Chem Inf Model - Predicting pK(a) values of substituted phenols from atomic charges: comparison of different quantum mechanical methods and charge distribution schemes. ( 0,681553641638177 )
BMC Med Inform Decis Mak - Regression tree construction by bootstrap: model search for DRG-systems applied to Austrian health-data. ( 0,666504372577028 )
Int J Health Geogr - Incorporating geographical factors with artificial neural networks to predict reference values of erythrocyte sedimentation rate. ( 0,658766619545886 )
AMIA Annu Symp Proc - Predicting the dengue incidence in Singapore using univariate time series models. ( 0,656583767990823 )
J Chem Inf Model - Impact of template choice on homology model efficiency in virtual screening. ( 0,656416034228358 )
J Chem Inf Model - Leave-cluster-out cross-validation is appropriate for scoring functions derived from diverse protein data sets. ( 0,652178682077587 )
J Biomed Inform - Transfer learning based clinical concept extraction on data from multiple sources. ( 0,650544322414428 )
J Chem Inf Model - Three useful dimensions for domain applicability in QSAR models using random forest. ( 0,647429727019382 )
J Biomed Inform - MysiRNA: improving siRNA efficacy prediction using a machine-learning model combining multi-tools and whole stacking energy (G). ( 0,642835721725208 )
J Am Med Inform Assoc - Harvest: an open platform for developing web-based biomedical data discovery and reporting applications. ( 0,636750548921634 )
Comput. Aided Surg. - Evaluation of a computational model to predict elbow range of motion. ( 0,636501989071824 )
Med Biol Eng Comput - Optimal design of clinical tests for the identification of physiological models of type 1 diabetes in the presence of model mismatch. ( 0,635366730954644 )
Med Decis Making - Developing a tuberculosis transmission model that accounts for changes in population health. ( 0,635151645096797 )
J Chem Inf Model - Prediction of linear cationic antimicrobial peptides based on characteristics responsible for their interaction with the membranes. ( 0,625877040158012 )
Int J Health Geogr - A linear programming model for preserving privacy when disclosing patient spatial information for secondary purposes. ( 0,624715622500652 )
J Chem Inf Model - CSAR data set release 2012: ligands, affinities, complexes, and docking decoys. ( 0,621559013883565 )
Comput Methods Programs Biomed - A predictive model of longitudinal, patient-specific colonoscopy results. ( 0,621244274697988 )
Int J Comput Assist Radiol Surg - Assessing performance in brain tumor resection using a novel virtual reality simulator. ( 0,619350502257364 )
J Chem Inf Model - Statistical analysis and compound selection of combinatorial libraries for soluble epoxide hydrolase. ( 0,617218639782614 )
J. Med. Internet Res. - A case study of the New York City 2012-2013 influenza season with daily geocoded Twitter data from temporal and spatiotemporal perspectives. ( 0,616647205521273 )
Med Biol Eng Comput - Application of the RIMARC algorithm to a large data set of action potentials and clinical parameters for risk prediction of atrial fibrillation. ( 0,615082602747108 )
J Chem Inf Model - A new approach to radial basis function approximation and its application to QSAR. ( 0,613818383334798 )
J Chem Inf Model - Best of both worlds: combining pharma data and state of the art modeling technology to improve in Silico pKa prediction. ( 0,613639497673491 )
Artif Intell Med - Fuzzy model identification of dengue epidemic in Colombia based on multiresolution analysis. ( 0,613589324537666 )
Int J Health Geogr - Comparative analysis of remotely-sensed data products via ecological niche modeling of avian influenza case occurrences in Middle Eastern poultry. ( 0,609726122978352 )
J Chem Inf Model - Introducing conformal prediction in predictive modeling. A transparent and flexible alternative to applicability domain determination. ( 0,609225465552193 )
Lifetime Data Anal - Analysis of cure rate survival data under proportional odds model. ( 0,606614012456587 )
J Am Med Inform Assoc - Choosing blindly but wisely: differentially private solicitation of DNA datasets for disease marker discovery. ( 0,605997235196886 )
Comput Methods Programs Biomed - Predicting body fat percentage based on gender, age and BMI by using artificial neural networks. ( 0,602426773522193 )
J Chem Inf Model - In silico prediction of aqueous solubility using simple QSPR models: the importance of phenol and phenol-like moieties. ( 0,602093544120398 )
J Chem Inf Model - Applicability domain based on ensemble learning in classification and regression analyses. ( 0,601033507560462 )
Comput Methods Programs Biomed - Kinetic modelling of haemodialysis removal of myoglobin in rhabdomyolysis patients. ( 0,59129721277506 )
IEEE Trans Image Process - Incremental N-mode SVD for large-scale multilinear generative models. ( 0,591199854771472 )
J Chem Inf Model - Robust scoring functions for protein-ligand interactions with quantum chemical charge models. ( 0,588479288252654 )
J Chem Inf Model - Four-dimensional structure-activity relationship model to predict HIV-1 integrase strand transfer inhibition using LQTA-QSAR methodology. ( 0,58711945709597 )
J Chem Inf Model - Hsp90 inhibitors, part 1: definition of 3-D QSAutogrid/R models as a tool for virtual screening. ( 0,585869390318768 )
J Chem Inf Model - Optimizing predictive performance of CASE Ultra expert system models using the applicability domains of individual toxicity alerts. ( 0,585672930401837 )
J Chem Inf Model - Applicability Domain ANalysis (ADAN): a robust method for assessing the reliability of drug property predictions. ( 0,581950699515306 )
J Chem Inf Model - Combined 3D-QSAR, molecular docking, and molecular dynamics study on piperazinyl-glutamate-pyridines/pyrimidines as potent P2Y12 antagonists for inhibition of platelet aggregation. ( 0,58152125776919 )
Spat Spatiotemporal Epidemiol - Spatial modelling of disease using data- and knowledge-driven approaches. ( 0,580151375376333 )
J. Med. Internet Res. - Outsourcing medical data analyses: can technology overcome legal, privacy, and confidentiality issues? ( 0,57869690071953 )
J Chem Inf Model - Real external predictivity of QSAR models. Part 2. New intercomparable thresholds for different validation criteria and the need for scatter plot inspection. ( 0,578290781958801 )
J Chem Inf Model - In silico prediction of total human plasma clearance. ( 0,57827772645158 )
Int J Med Inform - Design and implementation of I2Vote--an interactive image-based voting system using windows mobile devices. ( 0,57792698861032 )
Comput. Biol. Med. - Artificial neural network modelling of the results of tympanoplasty in chronic suppurative otitis media patients. ( 0,577003694244971 )
J Chem Inf Model - Ligand and structure-based classification models for prediction of P-glycoprotein inhibitors. ( 0,575133427351656 )
Comput Math Methods Med - Multiscale autoregressive identification of neuroelectrophysiological systems. ( 0,57497487239231 )
J Chem Inf Model - Classification of compounds with distinct or overlapping multi-target activities and diverse molecular mechanisms using emerging chemical patterns. ( 0,574862825483722 )
J Chem Inf Model - Development of novel 3D-QSAR combination approach for screening and optimizing B-Raf inhibitors in silico. ( 0,574241490409025 )
Comput. Biol. Med. - A prediction model of substrates and non-substrates of breast cancer resistance protein (BCRP) developed by GA-CG-SVM method. ( 0,573035244371362 )
Brief. Bioinformatics - Rediscovery rate estimation for assessing the validation of significant findings in high-throughput studies. ( 0,571718878934287 )
J Am Med Inform Assoc - Use of a support vector machine for categorizing free-text notes: assessment of accuracy across two institutions. ( 0,569441603160677 )
J Chem Inf Model - In silico prediction of chemical Ames mutagenicity. ( 0,564906605091477 )
J Chem Inf Model - Stochastic proximity embedding on graphics processing units: taking multidimensional scaling to a new scale. ( 0,558227858677969 )
J Chem Inf Model - A multiscale simulation system for the prediction of drug-induced cardiotoxicity. ( 0,555989390453212 )
Med Decis Making - Prediction of health preference values from CD4 counts in individuals with HIV. ( 0,555079693678452 )
J Chem Inf Model - Coping with unbalanced class data sets in oral absorption models. ( 0,553429636268867 )
J Chem Inf Model - Criterion for evaluating the predictive ability of nonlinear regression models without cross-validation. ( 0,551615640579673 )
J Chem Inf Model - Real external predictivity of QSAR models: how to evaluate it? Comparison of different validation criteria and proposal of using the concordance correlation coefficient. ( 0,550850868137655 )
Neural Comput - Molecular diffusion model of neurotransmitter homeostasis around synapses supporting gradients. ( 0,549087457720203 )
J Chem Inf Model - DrugPred: a structure-based approach to predict protein druggability developed using an extensive nonredundant data set. ( 0,548303098721601 )
IEEE Trans Image Process - Neighborhood Supported Model Level Fuzzy Aggregation for Moving Object Segmentation. ( 0,548171959130358 )
J Chem Inf Model - Building a three-dimensional model of CYP2C9 inhibition using the Autocorrelator: an autonomous model generator. ( 0,547144375094354 )
AMIA Annu Symp Proc - Ontology-based federated data access to human studies information. ( 0,545429948645633 )
J Chem Inf Model - Estimation of carcinogenicity using molecular fragments tree. ( 0,544323177041279 )
Med Biol Eng Comput - Development of a comprehensive musculoskeletal model of the shoulder and elbow. ( 0,540883007575371 )
Artif Intell Med - Image partitioning and illumination in image-based pose detection for teleoperated flexible endoscopes. ( 0,540073068649283 )
Neural Comput - Improved similarity measures for small sets of spike trains. ( 0,539472839168085 )
Comput. Biol. Med. - Quantification of contributions of molecular fragments for eye irritation of organic chemicals using QSAR study. ( 0,539023005489873 )
J Chem Inf Model - Development of the knowledge-based and empirical combined scoring algorithm (KECSA) to score protein-ligand interactions. ( 0,536394017010003 )
Comput Methods Programs Biomed - Interstitial insulin kinetic parameters for a 2-compartment insulin model with saturable clearance. ( 0,536282968680146 )
AMIA Annu Symp Proc - Identifying Deviations from Usual Medical Care using a Statistical Approach. ( 0,535785311542462 )
AMIA Annu Symp Proc - Order sets in computerized physician order entry systems: an analysis of seven sites. ( 0,535371757266134 )
J Chem Inf Model - Binary classification of a large collection of environmental chemicals from estrogen receptor assays by quantitative structure-activity relationship and machine learning methods. ( 0,535260034776459 )
J Chem Inf Model - Design and synthesis of new antioxidants predicted by the model developed on a set of pulvinic acid derivatives. ( 0,533927386735979 )
Telemed J E Health - Measurement adherence in the blood pressure self-measurement room. ( 0,533893704155379 )
Comput Methods Programs Biomed - Bayesian bivariate generalized Lindley model for survival data with a cure fraction. ( 0,533727479067025 )
J Chem Inf Model - Oversampling to overcome overfitting: exploring the relationship between data set composition, molecular descriptors, and predictive modeling methods. ( 0,529048254808205 )
J Chem Inf Model - Binary classification of aqueous solubility using support vector machines with reduction and recombination feature selection. ( 0,526694915294505 )
J Chem Inf Model - Automated building of organometallic complexes from 3D fragments. ( 0,523773389015914 )
J Chem Inf Model - Analysis and study of molecule data sets using snowflake diagrams of weighted maximum common subgraph trees. ( 0,520861603868954 )
BMC Med Inform Decis Mak - Modeling healthcare authorization and claim submissions using the openEHR dual-model approach. ( 0,518996999378212 )
J Chem Inf Model - Comparison of random forest and Pipeline Pilot Na?ve Bayes in prospective QSAR predictions. ( 0,517299731823057 )