J Chem Inf Model - Binary classification of a large collection of environmental chemicals from estrogen receptor assays by quantitative structure-activity relationship and machine learning methods.


{ compound(1573) activ(1297) structur(1058) }
{ model(2656) set(1616) predict(1553) }
{ featur(3375) classif(2383) classifi(1994) }
{ perform(1367) use(1326) method(1137) }
{ learn(2355) train(1041) set(1003) }
{ implement(1333) system(1263) develop(1122) }
{ bind(1733) structur(1185) ligand(1036) }
{ model(3404) distribut(989) bayesian(671) }
{ imag(1057) registr(996) error(939) }
{ assess(1506) score(1403) qualiti(1306) }
{ data(3008) multipl(1320) sourc(1022) }
{ drug(1928) target(777) effect(648) }
{ problem(2511) optim(1539) algorithm(950) }
{ error(1145) method(1030) estim(1020) }
{ design(1359) user(1324) use(1319) }
{ care(1570) inform(1187) nurs(1089) }
{ howev(809) still(633) remain(590) }
{ spatial(1525) area(1432) region(1030) }
{ age(1611) year(1155) adult(843) }
{ time(1939) patient(1703) rate(768) }
{ patient(1821) servic(1111) care(1106) }
{ method(2212) result(1239) propos(1039) }
{ system(1976) rule(880) can(841) }
{ sequenc(1873) structur(1644) protein(1328) }
{ imag(2830) propos(1344) filter(1198) }
{ patient(2315) diseas(1263) diabet(1191) }
{ clinic(1479) use(1117) guidelin(835) }
{ algorithm(1844) comput(1787) effici(935) }
{ extract(1171) text(1153) clinic(932) }
{ method(1557) propos(1049) approach(1037) }
{ control(1307) perform(991) simul(935) }
{ search(2224) databas(1162) retriev(909) }
{ risk(3053) factor(974) diseas(938) }
{ import(1318) role(1303) understand(862) }
{ model(2341) predict(2261) use(1141) }
{ state(1844) use(1261) util(961) }
{ medic(1828) order(1363) alert(1069) }
{ cost(1906) reduc(1198) effect(832) }
{ group(2977) signific(1463) compar(1072) }
{ gene(2352) biolog(1181) express(1162) }
{ analysi(2126) use(1163) compon(1037) }
{ decis(3086) make(1611) patient(1517) }
{ activ(1452) weight(1219) physic(1104) }
{ method(1969) cluster(1462) data(1082) }
{ can(774) often(719) complex(702) }
{ imag(1947) propos(1133) code(1026) }
{ data(1737) use(1416) pattern(1282) }
{ inform(2794) health(2639) internet(1427) }
{ measur(2081) correl(1212) valu(896) }
{ method(1219) similar(1157) match(930) }
{ network(2748) neural(1063) input(814) }
{ imag(2675) segment(2577) method(1081) }
{ take(945) account(800) differ(722) }
{ studi(2440) review(1878) systemat(933) }
{ motion(1329) object(1292) video(1091) }
{ treatment(1704) effect(941) patient(846) }
{ surgeri(1148) surgic(1085) robot(1054) }
{ framework(1458) process(801) describ(734) }
{ chang(1828) time(1643) increas(1301) }
{ concept(1167) ontolog(924) domain(897) }
{ data(1714) softwar(1251) tool(1186) }
{ model(2220) cell(1177) simul(1124) }
{ general(901) number(790) one(736) }
{ method(984) reconstruct(947) comput(926) }
{ featur(1941) imag(1645) propos(1176) }
{ case(1353) use(1143) diagnosi(1136) }
{ data(3963) clinic(1234) research(1004) }
{ studi(1410) differ(1259) use(1210) }
{ perform(999) metric(946) measur(919) }
{ research(1085) discuss(1038) issu(1018) }
{ system(1050) medic(1026) inform(1018) }
{ visual(1396) interact(850) tool(830) }
{ studi(1119) effect(1106) posit(819) }
{ blood(1257) pressur(1144) flow(957) }
{ record(1888) medic(1808) patient(1693) }
{ health(3367) inform(1360) care(1135) }
{ model(3480) simul(1196) paramet(876) }
{ monitor(1329) mobil(1314) devic(1160) }
{ ehr(2073) health(1662) electron(1139) }
{ research(1218) medic(880) student(794) }
{ patient(2837) hospit(1953) medic(668) }
{ data(2317) use(1299) case(1017) }
{ signal(2180) analysi(812) frequenc(800) }
{ sampl(1606) size(1419) use(1276) }
{ first(2504) two(1366) second(1323) }
{ intervent(3218) particip(2042) group(1664) }
{ activ(1138) subject(705) human(624) }
{ use(2086) technolog(871) perceiv(783) }
{ can(981) present(881) function(850) }
{ health(1844) social(1437) communiti(874) }
{ structur(1116) can(940) graph(676) }
{ high(1669) rate(1365) level(1280) }
{ cancer(2502) breast(956) screen(824) }
{ use(976) code(926) identifi(902) }
{ use(1733) differ(960) four(931) }
{ result(1111) use(1088) new(759) }
{ survey(1388) particip(1329) question(1065) }
{ estim(2440) model(1874) function(577) }
{ process(1125) use(805) approach(778) }
{ detect(2391) sensit(1101) algorithm(908) }


There are thousands of environmental chemicals subject to regulatory decisions for endocrine disrupting potential. The ToxCast and Tox21 programs have tested ~8200 chemicals in a broad screening panel of in vitro high-throughput screening (HTS) assays for estrogen receptor (ER) agonist and antagonist activity. The present work uses this large data set to develop in silico quantitative structure-activity relationship (QSAR) models using machine learning (ML) methods and a novel approach to manage the imbalanced data distribution. Training compounds from the ToxCast project were categorized as active or inactive (binding or nonbinding) classes based on a composite ER Interaction Score derived from a collection of 13 ER in vitro assays. A total of 1537 chemicals from ToxCast were used to derive and optimize the binary classification models while 5073 additional chemicals from the Tox21 project, evaluated in 2 of the 13 in vitro assays, were used to externally validate the model performance. In order to handle the imbalanced distribution of active and inactive chemicals, we developed a cluster-selection strategy to minimize information loss and increase predictive performance and compared this strategy to three currently popular techniques: cost-sensitive learning, oversampling of the minority class, and undersampling of the majority class. QSAR classification models were built to relate the molecular structures of chemicals to their ER activities using linear discriminant analysis (LDA), classification and regression trees (CART), and support vector machines (SVM) with 51 molecular descriptors from QikProp and 4328 bits of structural fingerprints as explanatory variables. A random forest (RF) feature selection method was employed to extract the structural features most relevant to the ER activity. The best model was obtained using SVM in combination with a subset of descriptors identified from a large set via the RF algorithm, which recognized the active and inactive compounds at the accuracies of 76.1% and 82.8% with a total accuracy of 81.6% on the internal test set and 70.8% on the external test set. These results demonstrate that a combination of high-quality experimental data and ML methods can lead to robust models that achieve excellent predictive accuracy, which are potentially useful for facilitating the virtual screening of chemicals for environmental risk assessment.

Resumo Limpo

thousand environment chemic subject regulatori decis endocrin disrupt potenti toxcast tox program test chemic broad screen panel vitro highthroughput screen hts assay estrogen receptor er agonist antagonist activ present work use larg data set develop silico quantit structureact relationship qsar model use machin learn ml method novel approach manag imbalanc data distribut train compound toxcast project categor activ inact bind nonbind class base composit er interact score deriv collect er vitro assay total chemic toxcast use deriv optim binari classif model addit chemic tox project evalu vitro assay use extern valid model perform order handl imbalanc distribut activ inact chemic develop clusterselect strategi minim inform loss increas predict perform compar strategi three current popular techniqu costsensit learn oversampl minor class undersampl major class qsar classif model built relat molecular structur chemic er activ use linear discrimin analysi lda classif regress tree cart support vector machin svm molecular descriptor qikprop bit structur fingerprint explanatori variabl random forest rf featur select method employ extract structur featur relev er activ best model obtain use svm combin subset descriptor identifi larg set via rf algorithm recogn activ inact compound accuraci total accuraci intern test set extern test set result demonstr combin highqual experiment data ml method can lead robust model achiev excel predict accuraci potenti use facilit virtual screen chemic environment risk assess

Resumos Similares

J Chem Inf Model - Predictions of BuChE inhibitors using support vector machine and naive Bayesian classification techniques in drug discovery. ( 0,923640312525992 )
J Chem Inf Model - Hsp90 inhibitors, part 1: definition of 3-D QSAutogrid/R models as a tool for virtual screening. ( 0,90103068443415 )
J Chem Inf Model - Oversampling to overcome overfitting: exploring the relationship between data set composition, molecular descriptors, and predictive modeling methods. ( 0,883944399232828 )
J Chem Inf Model - Coping with unbalanced class data sets in oral absorption models. ( 0,872079309533794 )
J Chem Inf Model - Profile-QSAR and Surrogate AutoShim protein-family modeling of proteases. ( 0,864058921542907 )
J Am Med Inform Assoc - Drug repurposing: mining protozoan proteomes for targets of known bioactive compounds. ( 0,862360627370199 )
J Chem Inf Model - Structure based model for the prediction of phospholipidosis induction potential of small molecules. ( 0,860305931578214 )
J Chem Inf Model - Classification of compounds with distinct or overlapping multi-target activities and diverse molecular mechanisms using emerging chemical patterns. ( 0,85922457264797 )
J Chem Inf Model - Statistical analysis and compound selection of combinatorial libraries for soluble epoxide hydrolase. ( 0,852025740018022 )
J Chem Inf Model - Design and synthesis of new antioxidants predicted by the model developed on a set of pulvinic acid derivatives. ( 0,850195369420654 )
J Chem Inf Model - In silico prediction of total human plasma clearance. ( 0,842003645674565 )
J Chem Inf Model - Discovery of novel antimalarial compounds enabled by QSAR-based virtual screening. ( 0,84188155240324 )
J Chem Inf Model - Profile-QSAR: a novel meta-QSAR method that combines activities across the kinase family to accurately predict affinity, selectivity, and cellular activity. ( 0,834545291871571 )
J Chem Inf Model - Discovering new agents active against methicillin-resistant Staphylococcus aureus with ligand-based approaches. ( 0,83072866521386 )
J Chem Inf Model - Jointly handling potency and toxicity of antimicrobial peptidomimetics by simple rules from desirability theory and chemoinformatics. ( 0,826619434378411 )
J Chem Inf Model - Hsp90 inhibitors, part 2: combining ligand-based and structure-based approaches for virtual screening application. ( 0,814688657974174 )
J Chem Inf Model - In silico prediction of chemical Ames mutagenicity. ( 0,814204631218755 )
J Chem Inf Model - A new protocol for predicting novel GSK-3? ATP competitive inhibitors. ( 0,813343160367357 )
J Chem Inf Model - QSAR classification model for antibacterial compounds and its use in virtual screening. ( 0,805505929464871 )
J Chem Inf Model - Application of quantitative structure-activity relationship models of 5-HT1A receptor binding to virtual screening identifies novel and potent 5-HT1A ligands. ( 0,799287443520837 )
J Chem Inf Model - Construction and use of fragment-augmented molecular Hasse diagrams. ( 0,795042439733048 )
J Chem Inf Model - Quantitative structure-activity relationship models for ready biodegradability of chemicals. ( 0,78843914196648 )
J Chem Inf Model - Revisiting the general solubility equation: in silico prediction of aqueous solubility incorporating the effect of topographical polar surface area. ( 0,786494197938902 )
J Chem Inf Model - Compound set enrichment: a novel approach to analysis of primary HTS data. ( 0,786161418547075 )
J Chem Inf Model - In silico prediction of aqueous solubility using simple QSPR models: the importance of phenol and phenol-like moieties. ( 0,778558552652746 )
J Chem Inf Model - A comparison of different QSAR approaches to modeling CYP450 1A2 inhibition. ( 0,773771459569489 )
J Chem Inf Model - Automated building of organometallic complexes from 3D fragments. ( 0,765894848016065 )
J Chem Inf Model - Synthesis, bioassay, and molecular field topology analysis of diverse vasodilatory heterocycles. ( 0,760395980579524 )
J Chem Inf Model - Experimental and computational prediction of glass transition temperature of drugs. ( 0,75687219765273 )
J Chem Inf Model - Predictive models for cytochrome p450 isozymes based on quantitative high throughput screening data. ( 0,756278519018123 )
J Chem Inf Model - Modeling drug-induced anorexia by molecular topology. ( 0,755322432574129 )
J Chem Inf Model - Binary classification of aqueous solubility using support vector machines with reduction and recombination feature selection. ( 0,754933890984733 )
J Chem Inf Model - How accurately can we predict the melting points of drug-like compounds? ( 0,752503683039417 )
J Chem Inf Model - GA(M)E-QSAR: a novel, fully automatic genetic-algorithm-(meta)-ensembles approach for binary classification in ligand-based drug design. ( 0,752245904361356 )
J Chem Inf Model - A new approach to radial basis function approximation and its application to QSAR. ( 0,74708667010913 )
J Chem Inf Model - QSAR modeling of imbalanced high-throughput screening data in PubChem. ( 0,746860806042553 )
J Chem Inf Model - Discovery and design of tricyclic scaffolds as protein kinase CK2 (CK2) inhibitors through a combination of shape-based virtual screening and structure-based molecular modification. ( 0,743490445821542 )
J Chem Inf Model - Predicting myelosuppression of drugs from in silico models. ( 0,732912871404153 )
J Chem Inf Model - Analysis and study of molecule data sets using snowflake diagrams of weighted maximum common subgraph trees. ( 0,729209818513035 )
BMC Med Inform Decis Mak - Regression tree construction by bootstrap: model search for DRG-systems applied to Austrian health-data. ( 0,727759354050331 )
J Chem Inf Model - Freely available conformer generation methods: how good are they? ( 0,726423827537239 )
J Chem Inf Model - Pharmacophore assessment through 3-D QSAR: evaluation of the predictive ability on new derivatives by the application on a series of antitubercular agents. ( 0,725321014101996 )
J Chem Inf Model - Fighting high molecular weight in bioactive molecules with sub-pharmacophore-based virtual screening. ( 0,722997982775755 )
J Chem Inf Model - TIN-a combinatorial compound collection of synthetically feasible multicomponent synthesis products. ( 0,717738179446871 )
J Chem Inf Model - Exploring uncharted territories: predicting activity cliffs in structure-activity landscapes. ( 0,717119067812282 )
J Integr Bioinform - Database supported candidate search for metabolite identification. ( 0,715805386559471 )
J Chem Inf Model - Quantitative structure-activity relationship models of clinical pharmacokinetics: clearance and volume of distribution. ( 0,715053786997717 )
J Chem Inf Model - A multivariate chemical similarity approach to search for drugs of potential environmental concern. ( 0,713105866562311 )
J Chem Inf Model - Combining horizontal and vertical substructure relationships in scaffold hierarchies for activity prediction. ( 0,711822648148337 )
J Chem Inf Model - Identification of novel malarial cysteine protease inhibitors using structure-based virtual screening of a focused cysteine protease inhibitor library. ( 0,70953537532712 )
J Chem Inf Model - Applicability Domain ANalysis (ADAN): a robust method for assessing the reliability of drug property predictions. ( 0,709520005277615 )
J Chem Inf Model - Prediction of activity cliffs using support vector machines. ( 0,706188114002193 )
J Chem Inf Model - Knowledge-based libraries for predicting the geometric preferences of druglike molecules. ( 0,70586669238445 )
J Chem Inf Model - In silico assessment of chemical biodegradability. ( 0,70560249458185 )
J Chem Inf Model - Maximum-score diversity selection for early drug discovery. ( 0,702377512738988 )
J Chem Inf Model - Polypharmacology directed compound data mining: identification of promiscuous chemotypes with different activity profiles and comparison to approved drugs. ( 0,702311566684323 )
J Chem Inf Model - Identification of a novel inhibitor of dengue virus protease through use of a virtual screening drug discovery Web portal. ( 0,700686846126988 )
J Chem Inf Model - Target-independent prediction of drug synergies using only drug lipophilicity. ( 0,698075743774316 )
J Chem Inf Model - Kinome-wide activity modeling from diverse public high-quality data sets. ( 0,697359785168282 )
J Chem Inf Model - Identification of inhibitors against p90 ribosomal S6 kinase 2 (RSK2) through structure-based virtual screening with the inhibitor-constrained refined homology model. ( 0,697270547781175 )
J Chem Inf Model - Development of a minimal kinase ensemble receptor (MKER) for surrogate AutoShim. ( 0,696197430623139 )
J Chem Inf Model - Chemical data visualization and analysis with incremental generative topographic mapping: big data challenge. ( 0,694907333975772 )
J Chem Inf Model - How diverse are diversity assessment methods? A comparative analysis and benchmarking of molecular descriptor space. ( 0,694460265042791 )
J Chem Inf Model - Molecular modeling of the 3D structure of 5-HT(1A)R: discovery of novel 5-HT(1A)R agonists via dynamic pharmacophore-based virtual screening. ( 0,694290771935143 )
J Chem Inf Model - BioSM: metabolomics tool for identifying endogenous mammalian biochemical structures in chemical structure space. ( 0,693640724795417 )
J Chem Inf Model - Validation of the AmpC ?-lactamase binding site and identification of inhibitors with novel scaffolds. ( 0,692373657500842 )
J Chem Inf Model - Locating sweet spots for screening hits and evaluating pan-assay interference filters from the performance analysis of two lead-like libraries. ( 0,69178419878033 )
J Chem Inf Model - Similarity boosted quantitative structure-activity relationship--a systematic study of enhancing structural descriptors by molecular similarity. ( 0,691295204226928 )
J Chem Inf Model - Identification of novel serotonin transporter compounds by virtual screening. ( 0,689822566847628 )
J Chem Inf Model - Automated recycling of chemistry for virtual screening and library design. ( 0,685909891665402 )
J Chem Inf Model - Structural similarity based kriging for quantitative structure activity and property relationship modeling. ( 0,685741625536777 )
Curr Comput Aided Drug Des - Development of Chemical Compound Libraries for In Silico Drug Screening. ( 0,684113117725389 )
Curr Comput Aided Drug Des - QSAR Models for the Reactivation of Sarin Inhibited AChE by Quaternary Pyridinium Oximes Based on Monte Carlo Method. ( 0,683648623605506 )
J Chem Inf Model - FAst MEtabolizer (FAME): A rapid and accurate predictor of sites of metabolism in multiple species by endogenous enzymes. ( 0,682328601165552 )
J Chem Inf Model - ColBioS-FlavRC: a collection of bioselective flavonoids and related compounds filtered from high-throughput screening outcomes. ( 0,682267682625842 )
J Chem Inf Model - De novo design of drug-like molecules by a fragment-based molecular evolutionary approach. ( 0,681600924826231 )
J Chem Inf Model - In silico enzymatic synthesis of a 400,000 compound biochemical database for nontargeted metabolomics. ( 0,681411158192528 )
J Chem Inf Model - Time-split cross-validation as a method for estimating the goodness of prospective prediction. ( 0,680465047891385 )
J Chem Inf Model - Identification of 1,2,5-oxadiazoles as a new class of SENP2 inhibitors using structure based virtual screening. ( 0,680446579379758 )
J Chem Inf Model - Identification of novel liver X receptor activators by structure-based modeling. ( 0,680232157301835 )
J Chem Inf Model - Conditional probabilistic analysis for prediction of the activity landscape and relative compound activities. ( 0,680159072017353 )
J Chem Inf Model - Design of multitarget activity landscapes that capture hierarchical activity cliff distributions. ( 0,678517252425114 )
J Chem Inf Model - G-protein coupled receptors virtual screening using genetic algorithm focused chemical space. ( 0,678504639761082 )
J Chem Inf Model - Quantitative structure-activity relationship models of chemical transformations from matched pairs analyses. ( 0,677480719783256 )
J Chem Inf Model - Ligand- and structure-based virtual screening for clathrodin-derived human voltage-gated sodium channel modulators. ( 0,677466729387815 )
J Chem Inf Model - Increasing the coverage of medicinal chemistry-relevant space in commercial fragments screening. ( 0,676051232453292 )
J Chem Inf Model - Integrating medicinal chemistry, organic/combinatorial chemistry, and computational chemistry for the discovery of selective estrogen receptor modulators with Forecaster, a novel platform for drug discovery. ( 0,675411147054359 )
J Chem Inf Model - Kinase-kernel models: accurate in silico screening of 4 million compounds across the entire human kinome. ( 0,675094928327165 )
J Chem Inf Model - Natural product-like virtual libraries: recursive atom-based enumeration. ( 0,674920474162093 )
Comput. Biol. Med. - In silico prediction of spleen tyrosine kinase inhibitors using machine learning approaches and an optimized molecular descriptor subset generated by recursive feature elimination method. ( 0,674420729664946 )
J Chem Inf Model - Application of computer-aided drug repurposing in the search of new cruzipain inhibitors: discovery of amiodarone and bromocriptine inhibitory effects. ( 0,67388903258226 )
J Chem Inf Model - Predicting pK(a) values of substituted phenols from atomic charges: comparison of different quantum mechanical methods and charge distribution schemes. ( 0,672953548497597 )
J Chem Inf Model - Feasibility of using molecular docking-based virtual screening for searching dual target kinase inhibitors. ( 0,672000814643449 )
J Chem Inf Model - Identification of novel S-adenosyl-L-homocysteine hydrolase inhibitors through homology-model-based virtual screening, synthesis, and biological evaluation. ( 0,671010655016892 )
J Chem Inf Model - Mining the ChEMBL database: an efficient chemoinformatics workflow for assembling an ion channel-focused screening library. ( 0,67024904247382 )
J Chem Inf Model - Development of novel 3D-QSAR combination approach for screening and optimizing B-Raf inhibitors in silico. ( 0,670211783225761 )
J Chem Inf Model - Application of support vector machine to three-dimensional shape-based virtual screening using comprehensive three-dimensional molecular shape overlay with known inhibitors. ( 0,670124809098947 )
J Chem Inf Model - Identification of sumoylation activating enzyme 1 inhibitors by structure-based virtual screening. ( 0,669875165851572 )
J Chem Inf Model - Selection of in silico drug screening results for G-protein-coupled receptors by using universal active probes. ( 0,669613433461121 )
J Chem Inf Model - Pre-processing feature selection for improved C&RT models for oral absorption. ( 0,668752581593106 )