J Chem Inf Model - Using random forest to model the domain applicability of another random forest model.

Tópicos

{ error(1145) method(1030) estim(1020) }
{ model(2341) predict(2261) use(1141) }
{ compound(1573) activ(1297) structur(1058) }
{ model(2656) set(1616) predict(1553) }
{ use(1733) differ(960) four(931) }
{ perform(999) metric(946) measur(919) }
{ result(1111) use(1088) new(759) }
{ learn(2355) train(1041) set(1003) }
{ method(1557) propos(1049) approach(1037) }
{ case(1353) use(1143) diagnosi(1136) }
{ system(1050) medic(1026) inform(1018) }
{ take(945) account(800) differ(722) }
{ framework(1458) process(801) describ(734) }
{ problem(2511) optim(1539) algorithm(950) }
{ general(901) number(790) one(736) }
{ featur(1941) imag(1645) propos(1176) }
{ data(3963) clinic(1234) research(1004) }
{ data(2317) use(1299) case(1017) }
{ can(774) often(719) complex(702) }
{ motion(1329) object(1292) video(1091) }
{ surgeri(1148) surgic(1085) robot(1054) }
{ chang(1828) time(1643) increas(1301) }
{ algorithm(1844) comput(1787) effici(935) }
{ control(1307) perform(991) simul(935) }
{ research(1085) discuss(1038) issu(1018) }
{ studi(1119) effect(1106) posit(819) }
{ state(1844) use(1261) util(961) }
{ research(1218) medic(880) student(794) }
{ age(1611) year(1155) adult(843) }
{ signal(2180) analysi(812) frequenc(800) }
{ data(3008) multipl(1320) sourc(1022) }
{ intervent(3218) particip(2042) group(1664) }
{ use(976) code(926) identifi(902) }
{ implement(1333) system(1263) develop(1122) }
{ model(3404) distribut(989) bayesian(671) }
{ imag(1947) propos(1133) code(1026) }
{ data(1737) use(1416) pattern(1282) }
{ inform(2794) health(2639) internet(1427) }
{ system(1976) rule(880) can(841) }
{ measur(2081) correl(1212) valu(896) }
{ imag(1057) registr(996) error(939) }
{ bind(1733) structur(1185) ligand(1036) }
{ sequenc(1873) structur(1644) protein(1328) }
{ method(1219) similar(1157) match(930) }
{ featur(3375) classif(2383) classifi(1994) }
{ imag(2830) propos(1344) filter(1198) }
{ network(2748) neural(1063) input(814) }
{ imag(2675) segment(2577) method(1081) }
{ patient(2315) diseas(1263) diabet(1191) }
{ studi(2440) review(1878) systemat(933) }
{ assess(1506) score(1403) qualiti(1306) }
{ treatment(1704) effect(941) patient(846) }
{ concept(1167) ontolog(924) domain(897) }
{ clinic(1479) use(1117) guidelin(835) }
{ extract(1171) text(1153) clinic(932) }
{ data(1714) softwar(1251) tool(1186) }
{ design(1359) user(1324) use(1319) }
{ model(2220) cell(1177) simul(1124) }
{ care(1570) inform(1187) nurs(1089) }
{ method(984) reconstruct(947) comput(926) }
{ search(2224) databas(1162) retriev(909) }
{ howev(809) still(633) remain(590) }
{ studi(1410) differ(1259) use(1210) }
{ risk(3053) factor(974) diseas(938) }
{ import(1318) role(1303) understand(862) }
{ visual(1396) interact(850) tool(830) }
{ perform(1367) use(1326) method(1137) }
{ blood(1257) pressur(1144) flow(957) }
{ spatial(1525) area(1432) region(1030) }
{ record(1888) medic(1808) patient(1693) }
{ health(3367) inform(1360) care(1135) }
{ model(3480) simul(1196) paramet(876) }
{ monitor(1329) mobil(1314) devic(1160) }
{ ehr(2073) health(1662) electron(1139) }
{ patient(2837) hospit(1953) medic(668) }
{ medic(1828) order(1363) alert(1069) }
{ cost(1906) reduc(1198) effect(832) }
{ group(2977) signific(1463) compar(1072) }
{ sampl(1606) size(1419) use(1276) }
{ gene(2352) biolog(1181) express(1162) }
{ first(2504) two(1366) second(1323) }
{ activ(1138) subject(705) human(624) }
{ time(1939) patient(1703) rate(768) }
{ patient(1821) servic(1111) care(1106) }
{ use(2086) technolog(871) perceiv(783) }
{ can(981) present(881) function(850) }
{ analysi(2126) use(1163) compon(1037) }
{ health(1844) social(1437) communiti(874) }
{ structur(1116) can(940) graph(676) }
{ high(1669) rate(1365) level(1280) }
{ cancer(2502) breast(956) screen(824) }
{ drug(1928) target(777) effect(648) }
{ survey(1388) particip(1329) question(1065) }
{ estim(2440) model(1874) function(577) }
{ decis(3086) make(1611) patient(1517) }
{ process(1125) use(805) approach(778) }
{ activ(1452) weight(1219) physic(1104) }
{ method(1969) cluster(1462) data(1082) }
{ method(2212) result(1239) propos(1039) }
{ detect(2391) sensit(1101) algorithm(908) }

Resumo

In QSAR, a statistical model is generated from a training set of molecules (represented by chemical descriptors) and their biological activities. We will call this traditional type of QSAR model an "activity model". The activity model can be used to predict the activities of molecules not in the training set. A relatively new subfield for QSAR is domain applicability. The aim is to estimate the reliability of prediction of a specific molecule on a specific activity model. A number of different metrics have been proposed in the literature for this purpose. It is desirable to build a quantitative model of reliability against one or more of these metrics. We can call this an "error model". A previous publication from our laboratory (Sheridan J. Chem. Inf. Model., 2012, 52, 814-823.) suggested the simultaneous use of three metrics would be more discriminating than any one metric. An error model could be built in the form of a three-dimensional set of bins. When the number of metrics exceeds three, however, the bin paradigm is not practical. An obvious solution for constructing an error model using multiple metrics is to use a QSAR method, in our case random forest. In this paper we demonstrate the usefulness of this paradigm, specifically for determining whether a useful error model can be built and which metrics are most useful for a given problem. For the ten data sets and for the seven metrics we examine here, it appears that it is possible to construct a useful error model using only two metrics (TREE_SD and PREDICTED). These do not require calculating similarities/distances between the molecules being predicted and the molecules used to build the activity model, which can be rate-limiting.

Resumo Limpo

qsar statist model generat train set molecul repres chemic descriptor biolog activ will call tradit type qsar model activ model activ model can use predict activ molecul train set relat new subfield qsar domain applic aim estim reliabl predict specif molecul specif activ model number differ metric propos literatur purpos desir build quantit model reliabl one metric can call error model previous public laboratori sheridan j chem inf model suggest simultan use three metric discrimin one metric error model built form threedimension set bin number metric exceed three howev bin paradigm practic obvious solut construct error model use multipl metric use qsar method case random forest paper demonstr use paradigm specif determin whether use error model can built metric use given problem ten data set seven metric examin appear possibl construct use error model use two metric treesd predict requir calcul similaritiesdist molecul predict molecul use build activ model can ratelimit

Resumos Similares

J Chem Inf Model - Experimental and computational prediction of glass transition temperature of drugs. ( 0,801063026978729 )
J Chem Inf Model - Capturing the crystal: prediction of enthalpy of sublimation, crystal lattice energy, and melting points of organic compounds. ( 0,694763928946301 )
J Chem Inf Model - QSAR modeling of imbalanced high-throughput screening data in PubChem. ( 0,669505950037767 )
J Chem Inf Model - FAst MEtabolizer (FAME): A rapid and accurate predictor of sites of metabolism in multiple species by endogenous enzymes. ( 0,649370550322343 )
J Clin Monit Comput - Effect of concurrent oxygen therapy on accuracy of forecasting imminent postoperative desaturation. ( 0,646440612354846 )
J Chem Inf Model - Applicability domains for classification problems: Benchmarking of distance to models for Ames mutagenicity set. ( 0,642696180032217 )
J Chem Inf Model - Analysis and study of molecule data sets using snowflake diagrams of weighted maximum common subgraph trees. ( 0,633672425770607 )
J Chem Inf Model - Development of novel 3D-QSAR combination approach for screening and optimizing B-Raf inhibitors in silico. ( 0,627193418967645 )
J Chem Inf Model - Ligand efficiency-based support vector regression models for predicting bioactivities of ligands to drug target proteins. ( 0,625489650934382 )
J Chem Inf Model - How accurately can we predict the melting points of drug-like compounds? ( 0,619652925379061 )
J Chem Inf Model - Predictive models for cytochrome p450 isozymes based on quantitative high throughput screening data. ( 0,618963548931816 )
J Chem Inf Model - Interpretable, probability-based confidence metric for continuous quantitative structure-activity relationship models. ( 0,607542527437414 )
J Chem Inf Model - Estimating error rates in bioactivity databases. ( 0,607447117231211 )
J Chem Inf Model - Are bigger data sets better for machine learning? Fusing single-point and dual-event dose response data for Mycobacterium tuberculosis. ( 0,601638057597836 )
Comput Math Methods Med - Predictive models for maximum recommended therapeutic dose of antiretroviral drugs. ( 0,594889906792536 )
J Chem Inf Model - Using information from historical high-throughput screens to predict active compounds. ( 0,594292189041914 )
J Chem Inf Model - Profile-QSAR: a novel meta-QSAR method that combines activities across the kinase family to accurately predict affinity, selectivity, and cellular activity. ( 0,588730819071713 )
J Chem Inf Model - Predictive toxicology modeling: protocols for exploring hERG classification and Tetrahymena pyriformis end point predictions. ( 0,582250399424325 )
Artif Intell Med - Predicting patient survival after liver transplantation using evolutionary multi-objective artificial neural networks. ( 0,580660766499074 )
J Chem Inf Model - Discovering new agents active against methicillin-resistant Staphylococcus aureus with ligand-based approaches. ( 0,577849566662342 )
BMC Med Inform Decis Mak - Bayesian predictors of very poor health related quality of life and mortality in patients with COPD. ( 0,575173046424725 )
Comput. Biol. Med. - An effective measure for assessing the quality of biclusters. ( 0,569313091035105 )
J Chem Inf Model - A new approach to radial basis function approximation and its application to QSAR. ( 0,565916860280056 )
J Chem Inf Model - Classification of compounds with distinct or overlapping multi-target activities and diverse molecular mechanisms using emerging chemical patterns. ( 0,565405635420827 )
J Chem Inf Model - Prediction of compound potency changes in matched molecular pairs using support vector regression. ( 0,563838117903591 )
J Chem Inf Model - Merging applicability domains for in silico assessment of chemical mutagenicity. ( 0,563655859543782 )
Med Decis Making - Constructing proper ROCs from ordinal response data using weighted power functions. ( 0,559915320043435 )
J Chem Inf Model - Benchmarking study of parameter variation when using signature fingerprints together with support vector machines. ( 0,556910272146935 )
J Chem Inf Model - Comparison of random forest and Pipeline Pilot Na?ve Bayes in prospective QSAR predictions. ( 0,55648466642585 )
Int J Health Geogr - Prediction of high-risk areas for visceral leishmaniasis using socioeconomic indicators and remote sensing data. ( 0,556420133413277 )
J Chem Inf Model - Capturing structure-activity relationships from chemogenomic spaces. ( 0,555934685697938 )
J Chem Inf Model - Binary classification of a large collection of environmental chemicals from estrogen receptor assays by quantitative structure-activity relationship and machine learning methods. ( 0,552411613385496 )
J Chem Inf Model - Best of both worlds: combining pharma data and state of the art modeling technology to improve in Silico pKa prediction. ( 0,551551711511461 )
Comput Math Methods Med - Screening for prediabetes using machine learning models. ( 0,550476848297196 )
BMC Med Inform Decis Mak - Regression tree construction by bootstrap: model search for DRG-systems applied to Austrian health-data. ( 0,54894527655608 )
J Chem Inf Model - Statistical analysis and compound selection of combinatorial libraries for soluble epoxide hydrolase. ( 0,548753769827128 )
J Chem Inf Model - In silico prediction of chemical Ames mutagenicity. ( 0,548246933087862 )
J Chem Inf Model - Predictions of BuChE inhibitors using support vector machine and naive Bayesian classification techniques in drug discovery. ( 0,547277411920742 )
J Chem Inf Model - Design and synthesis of new antioxidants predicted by the model developed on a set of pulvinic acid derivatives. ( 0,543940116706127 )
J Chem Inf Model - Predicting myelosuppression of drugs from in silico models. ( 0,542920599356838 )
Int J Comput Assist Radiol Surg - Optimized order estimation for autoregressive models to predict respiratory motion. ( 0,542588852830894 )
J Chem Inf Model - Profile-QSAR and Surrogate AutoShim protein-family modeling of proteases. ( 0,542405452326657 )
J Am Med Inform Assoc - A novel method of adverse event detection can accurately identify venous thromboembolisms (VTEs) from narrative electronic health record data. ( 0,542309958516488 )
J Chem Inf Model - Three useful dimensions for domain applicability in QSAR models using random forest. ( 0,542302411700167 )
Comput Biol Chem - Using ensemble methods to deal with imbalanced data in predicting protein-protein interactions. ( 0,541924726671261 )
Comput Math Methods Med - Variable selection in ROC regression. ( 0,54072575390953 )
J Chem Inf Model - Fusing dual-event data sets for Mycobacterium tuberculosis machine learning models and their evaluation. ( 0,539051110550882 )
J Chem Inf Model - Pragmatic approaches to using computational methods to predict xenobiotic metabolism. ( 0,538187890498682 )
J Chem Inf Model - Coping with unbalanced class data sets in oral absorption models. ( 0,537228567231977 )
J Chem Inf Model - A new protocol for predicting novel GSK-3? ATP competitive inhibitors. ( 0,536971081144726 )
J Chem Inf Model - Two new parameters based on distances in a receiver operating characteristic chart for the selection of classification models. ( 0,533750371610084 )
J Chem Inf Model - Construction and use of fragment-augmented molecular Hasse diagrams. ( 0,533045539893627 )
J Chem Inf Model - Revisiting the general solubility equation: in silico prediction of aqueous solubility incorporating the effect of topographical polar surface area. ( 0,532855844923485 )
J Chem Inf Model - Exploring polypharmacology using a ROCS-based target fishing approach. ( 0,531952333751834 )
J Chem Inf Model - Modeling drug-induced anorexia by molecular topology. ( 0,530202415267953 )
J Chem Inf Model - Structure based model for the prediction of phospholipidosis induction potential of small molecules. ( 0,529697398662354 )
J Chem Inf Model - Exploring uncharted territories: predicting activity cliffs in structure-activity landscapes. ( 0,529494462077312 )
J Am Med Inform Assoc - Drug repurposing: mining protozoan proteomes for targets of known bioactive compounds. ( 0,528353242384281 )
J Chem Inf Model - Hsp90 inhibitors, part 1: definition of 3-D QSAutogrid/R models as a tool for virtual screening. ( 0,527967887978497 )
J Chem Inf Model - DrugLogit: logistic discrimination between drugs and nondrugs including disease-specificity by assigning probabilities based on molecular properties. ( 0,526683395005422 )
Artif Intell Med - Prediction of human major histocompatibility complex class II binding peptides by continuous kernel discrimination method. ( 0,526659872385287 )
J Chem Inf Model - Combined receptor and ligand-based approach to the universal pharmacophore model development for studies of drug blockade to the hERG1 pore domain. ( 0,526240586469985 )
J Chem Inf Model - In silico prediction of aqueous solubility using simple QSPR models: the importance of phenol and phenol-like moieties. ( 0,524314349007235 )
Med Decis Making - Adaptation of clinical prediction models for application in local settings. ( 0,524296583440987 )
J Chem Inf Model - Binary classification of aqueous solubility using support vector machines with reduction and recombination feature selection. ( 0,523126397062955 )
J Chem Inf Model - Homology modeling of human muscarinic acetylcholine receptors. ( 0,522433538307184 )
J Chem Inf Model - Quantitative structure-activity relationship models of clinical pharmacokinetics: clearance and volume of distribution. ( 0,522290828621316 )
Med Decis Making - Performance of a mathematical model to forecast lives saved from HIV treatment expansion in resource-limited settings. ( 0,522251499442044 )
J. Comput. Biol. - Threshold group testing on inhibitor model. ( 0,522096403424468 )
Comput. Biol. Med. - The diagnosis of hypovolemia using advanced statistical methods. ( 0,521815505331312 )
AMIA Annu Symp Proc - Interoperability of medical databases: construction of mapping between hospitals laboratory results assisted by automated comparison of their distributions. ( 0,520489086392178 )
Comput Math Methods Med - Privacy-preserving restricted boltzmann machine. ( 0,517172358994849 )
IEEE Trans Image Process - Fast bi-directional prediction selection in H.264/MPEG-4 AVC temporal scalable video coding. ( 0,517109911349377 )
BMC Med Inform Decis Mak - A three-step approach for the derivation and validation of high-performing predictive models using an operational dataset: congestive heart failure readmission case study. ( 0,514868332255237 )
J Chem Inf Model - Predicting pK(a) values of substituted phenols from atomic charges: comparison of different quantum mechanical methods and charge distribution schemes. ( 0,514844539626418 )
J Chem Inf Model - Automated building of organometallic complexes from 3D fragments. ( 0,514687040444691 )
J Chem Inf Model - QSAR classification model for antibacterial compounds and its use in virtual screening. ( 0,514478452864928 )
J Chem Inf Model - Fighting high molecular weight in bioactive molecules with sub-pharmacophore-based virtual screening. ( 0,513838114742208 )
J Chem Inf Model - Application of support vector machine to three-dimensional shape-based virtual screening using comprehensive three-dimensional molecular shape overlay with known inhibitors. ( 0,513495987860434 )
J Chem Inf Model - Introduction of the conditional correlated Bernoulli model of similarity value distributions and its application to the prospective prediction of fingerprint search performance. ( 0,513425863506999 )
J Med Syst - Utilization of electronic medical records to build a detection model for surveillance of healthcare-associated urinary tract infections. ( 0,51309592899797 )
J Biomed Inform - MysiRNA: improving siRNA efficacy prediction using a machine-learning model combining multi-tools and whole stacking energy (G). ( 0,51270853545209 )
J Chem Inf Model - Quantitative structure-activity relationship models for ready biodegradability of chemicals. ( 0,512563071907221 )
J Chem Inf Model - Ligand-based virtual screening approach using a new scoring function. ( 0,511900200757204 )
Med Decis Making - A comparison of methods for converting DCE values onto the full health-dead QALY scale. ( 0,511453688045299 )
J Chem Inf Model - Discovery and design of tricyclic scaffolds as protein kinase CK2 (CK2) inhibitors through a combination of shape-based virtual screening and structure-based molecular modification. ( 0,510740380391421 )
Comput Methods Programs Biomed - Monitoring of anticoagulant therapy applying a dynamic statistical model. ( 0,510598170924593 )
J Chem Inf Model - Multitarget structure-activity relationships characterized by activity-difference maps and consensus similarity measure. ( 0,510021515851742 )
J Chem Inf Model - Applicability Domain ANalysis (ADAN): a robust method for assessing the reliability of drug property predictions. ( 0,50997213288641 )
IEEE Trans Image Process - Network-based H.264/AVC whole frame loss visibility model and frame dropping methods. ( 0,509494001241134 )
IEEE Trans Image Process - DEB: definite error bounded tangent estimator for digital curves. ( 0,509475406112143 )
J Chem Inf Model - Small-molecule 3D structure prediction using open crystallography data. ( 0,508934000251986 )
J Chem Inf Model - Dual histamine H3R/serotonin 5-HT4R ligands with antiamnesic properties: pharmacophore-based virtual screening and polypharmacology. ( 0,508614549482185 )
J Chem Inf Model - Scaffold-focused virtual screening: prospective application to the discovery of TTK inhibitors. ( 0,508334429824484 )
Brief. Bioinformatics - Added predictive value of high-throughput molecular data to clinical data and its validation. ( 0,50730803692378 )
J Chem Inf Model - Hsp90 inhibitors, part 2: combining ligand-based and structure-based approaches for virtual screening application. ( 0,506757865846797 )
Comput Methods Programs Biomed - Privacy-preserving Kruskal-Wallis test. ( 0,506367577612382 )
J Chem Inf Model - Enrichment of chemical libraries docked to protein conformational ensembles and application to aldehyde dehydrogenase 2. ( 0,503891382198635 )
Comput. Biol. Med. - CoMFA QSAR models of camptothecin analogues based on the distinctive SAR features of combined ABC, CD and E ring substitutions. ( 0,501872883080104 )
J. Comput. Biol. - Prediction of siRNA potency using sparse logistic regression. ( 0,501474789324052 )