J Chem Inf Model - Oversampling to overcome overfitting: exploring the relationship between data set composition, molecular descriptors, and predictive modeling methods.

Tópicos

{ model(2656) set(1616) predict(1553) }
{ compound(1573) activ(1297) structur(1058) }
{ featur(3375) classif(2383) classifi(1994) }
{ assess(1506) score(1403) qualiti(1306) }
{ gene(2352) biolog(1181) express(1162) }
{ data(1737) use(1416) pattern(1282) }
{ perform(1367) use(1326) method(1137) }
{ age(1611) year(1155) adult(843) }
{ group(2977) signific(1463) compar(1072) }
{ method(1557) propos(1049) approach(1037) }
{ can(981) present(881) function(850) }
{ system(1976) rule(880) can(841) }
{ chang(1828) time(1643) increas(1301) }
{ howev(809) still(633) remain(590) }
{ research(1085) discuss(1038) issu(1018) }
{ sampl(1606) size(1419) use(1276) }
{ health(1844) social(1437) communiti(874) }
{ model(3404) distribut(989) bayesian(671) }
{ bind(1733) structur(1185) ligand(1036) }
{ motion(1329) object(1292) video(1091) }
{ treatment(1704) effect(941) patient(846) }
{ learn(2355) train(1041) set(1003) }
{ clinic(1479) use(1117) guidelin(835) }
{ algorithm(1844) comput(1787) effici(935) }
{ design(1359) user(1324) use(1319) }
{ general(901) number(790) one(736) }
{ method(984) reconstruct(947) comput(926) }
{ search(2224) databas(1162) retriev(909) }
{ data(3963) clinic(1234) research(1004) }
{ system(1050) medic(1026) inform(1018) }
{ medic(1828) order(1363) alert(1069) }
{ drug(1928) target(777) effect(648) }
{ implement(1333) system(1263) develop(1122) }
{ estim(2440) model(1874) function(577) }
{ can(774) often(719) complex(702) }
{ imag(1947) propos(1133) code(1026) }
{ inform(2794) health(2639) internet(1427) }
{ measur(2081) correl(1212) valu(896) }
{ imag(1057) registr(996) error(939) }
{ sequenc(1873) structur(1644) protein(1328) }
{ method(1219) similar(1157) match(930) }
{ imag(2830) propos(1344) filter(1198) }
{ network(2748) neural(1063) input(814) }
{ imag(2675) segment(2577) method(1081) }
{ patient(2315) diseas(1263) diabet(1191) }
{ take(945) account(800) differ(722) }
{ studi(2440) review(1878) systemat(933) }
{ surgeri(1148) surgic(1085) robot(1054) }
{ framework(1458) process(801) describ(734) }
{ problem(2511) optim(1539) algorithm(950) }
{ error(1145) method(1030) estim(1020) }
{ concept(1167) ontolog(924) domain(897) }
{ extract(1171) text(1153) clinic(932) }
{ data(1714) softwar(1251) tool(1186) }
{ control(1307) perform(991) simul(935) }
{ model(2220) cell(1177) simul(1124) }
{ care(1570) inform(1187) nurs(1089) }
{ featur(1941) imag(1645) propos(1176) }
{ case(1353) use(1143) diagnosi(1136) }
{ studi(1410) differ(1259) use(1210) }
{ risk(3053) factor(974) diseas(938) }
{ perform(999) metric(946) measur(919) }
{ import(1318) role(1303) understand(862) }
{ model(2341) predict(2261) use(1141) }
{ visual(1396) interact(850) tool(830) }
{ studi(1119) effect(1106) posit(819) }
{ blood(1257) pressur(1144) flow(957) }
{ spatial(1525) area(1432) region(1030) }
{ record(1888) medic(1808) patient(1693) }
{ health(3367) inform(1360) care(1135) }
{ model(3480) simul(1196) paramet(876) }
{ monitor(1329) mobil(1314) devic(1160) }
{ ehr(2073) health(1662) electron(1139) }
{ state(1844) use(1261) util(961) }
{ research(1218) medic(880) student(794) }
{ patient(2837) hospit(1953) medic(668) }
{ data(2317) use(1299) case(1017) }
{ signal(2180) analysi(812) frequenc(800) }
{ cost(1906) reduc(1198) effect(832) }
{ data(3008) multipl(1320) sourc(1022) }
{ first(2504) two(1366) second(1323) }
{ intervent(3218) particip(2042) group(1664) }
{ activ(1138) subject(705) human(624) }
{ time(1939) patient(1703) rate(768) }
{ patient(1821) servic(1111) care(1106) }
{ use(2086) technolog(871) perceiv(783) }
{ analysi(2126) use(1163) compon(1037) }
{ structur(1116) can(940) graph(676) }
{ high(1669) rate(1365) level(1280) }
{ cancer(2502) breast(956) screen(824) }
{ use(976) code(926) identifi(902) }
{ use(1733) differ(960) four(931) }
{ result(1111) use(1088) new(759) }
{ survey(1388) particip(1329) question(1065) }
{ decis(3086) make(1611) patient(1517) }
{ process(1125) use(805) approach(778) }
{ activ(1452) weight(1219) physic(1104) }
{ method(1969) cluster(1462) data(1082) }
{ method(2212) result(1239) propos(1039) }
{ detect(2391) sensit(1101) algorithm(908) }

Resumo

The traditional biological assay is very time-consuming, and thus the ability to quickly screen large numbers of compounds against a specific biological target is appealing. To speed up the biological evaluation of compounds, high-throughput screening is widely used in the fields of biomedical, biological information, and drug discovery. The research presented in this study focuses on the use of support vector machines, a machine learning method, various classes of molecular descriptors, and different sampling techniques to overcome overfitting to classify compounds for cytotoxicity with respect to the Jurkat cell line. The cell cytotoxicity data set is imbalanced (a few active compounds and very many inactive compounds), and the ability of the predictive modeling methods is adversely affected in these situations. Commonly imbalanced data sets are overfit with respect to the dominant classified end point; in this study the models routinely overfit toward inactive (noncytotoxic) compounds when the imbalance was substantial. Support vector machine (SVM) models were used to probe the proficiency of different classes of molecular descriptors and oversampling ratios. The SVM models were constructed from 4D-FPs, MOE (1D, 2D, and 21/2D), noNP+MOE, and CATS2D trial descriptors pools and compared to the predictive abilities of CATS2D-based random forest models. Compared to previous results in the literature, the SVM models built from oversampled data sets exhibited better predictive abilities for the training and external test sets.

Resumo Limpo

tradit biolog assay timeconsum thus abil quick screen larg number compound specif biolog target appeal speed biolog evalu compound highthroughput screen wide use field biomed biolog inform drug discoveri research present studi focus use support vector machin machin learn method various class molecular descriptor differ sampl techniqu overcom overfit classifi compound cytotox respect jurkat cell line cell cytotox data set imbalanc activ compound mani inact compound abil predict model method advers affect situat common imbalanc data set overfit respect domin classifi end point studi model routin overfit toward inact noncytotox compound imbal substanti support vector machin svm model use probe profici differ class molecular descriptor oversampl ratio svm model construct dfps moe d d d nonpmo catsd trial descriptor pool compar predict abil catsdbas random forest model compar previous result literatur svm model built oversampl data set exhibit better predict abil train extern test set

Resumos Similares

J Chem Inf Model - Binary classification of a large collection of environmental chemicals from estrogen receptor assays by quantitative structure-activity relationship and machine learning methods. ( 0,883944399232828 )
J Chem Inf Model - Predictions of BuChE inhibitors using support vector machine and naive Bayesian classification techniques in drug discovery. ( 0,819794407166485 )
J Chem Inf Model - Coping with unbalanced class data sets in oral absorption models. ( 0,808588915567336 )
J Chem Inf Model - Hsp90 inhibitors, part 1: definition of 3-D QSAutogrid/R models as a tool for virtual screening. ( 0,794818356116047 )
J Chem Inf Model - In silico prediction of chemical Ames mutagenicity. ( 0,791166863643953 )
J Chem Inf Model - Classification of compounds with distinct or overlapping multi-target activities and diverse molecular mechanisms using emerging chemical patterns. ( 0,769918647096702 )
J Chem Inf Model - In silico prediction of total human plasma clearance. ( 0,761005483513604 )
J Chem Inf Model - Statistical analysis and compound selection of combinatorial libraries for soluble epoxide hydrolase. ( 0,759323133404121 )
J Chem Inf Model - Structure based model for the prediction of phospholipidosis induction potential of small molecules. ( 0,746774562304492 )
J Chem Inf Model - Design and synthesis of new antioxidants predicted by the model developed on a set of pulvinic acid derivatives. ( 0,744182328657689 )
J Chem Inf Model - Jointly handling potency and toxicity of antimicrobial peptidomimetics by simple rules from desirability theory and chemoinformatics. ( 0,742057216702508 )
J Chem Inf Model - In silico prediction of aqueous solubility using simple QSPR models: the importance of phenol and phenol-like moieties. ( 0,741897389447398 )
J Chem Inf Model - Profile-QSAR and Surrogate AutoShim protein-family modeling of proteases. ( 0,738429152042242 )
J Chem Inf Model - Discovery of novel antimalarial compounds enabled by QSAR-based virtual screening. ( 0,73591701631798 )
J Chem Inf Model - Binary classification of aqueous solubility using support vector machines with reduction and recombination feature selection. ( 0,732448065669938 )
J Chem Inf Model - GA(M)E-QSAR: a novel, fully automatic genetic-algorithm-(meta)-ensembles approach for binary classification in ligand-based drug design. ( 0,730420892979734 )
J Chem Inf Model - Quantitative structure-activity relationship models for ready biodegradability of chemicals. ( 0,728096636002744 )
J Chem Inf Model - Profile-QSAR: a novel meta-QSAR method that combines activities across the kinase family to accurately predict affinity, selectivity, and cellular activity. ( 0,725012110072916 )
J Chem Inf Model - Discovering new agents active against methicillin-resistant Staphylococcus aureus with ligand-based approaches. ( 0,725008934820462 )
J Am Med Inform Assoc - Drug repurposing: mining protozoan proteomes for targets of known bioactive compounds. ( 0,724168105923013 )
J Chem Inf Model - In silico assessment of chemical biodegradability. ( 0,709948495537129 )
J Chem Inf Model - A comparison of different QSAR approaches to modeling CYP450 1A2 inhibition. ( 0,708522333048779 )
J Chem Inf Model - Hsp90 inhibitors, part 2: combining ligand-based and structure-based approaches for virtual screening application. ( 0,707284549179814 )
J Chem Inf Model - A new protocol for predicting novel GSK-3? ATP competitive inhibitors. ( 0,698191981354158 )
J Chem Inf Model - Applicability Domain ANalysis (ADAN): a robust method for assessing the reliability of drug property predictions. ( 0,697984682840323 )
J Chem Inf Model - Pre-processing feature selection for improved C&RT models for oral absorption. ( 0,697270012278447 )
J Chem Inf Model - Construction and use of fragment-augmented molecular Hasse diagrams. ( 0,694584620011296 )
J Chem Inf Model - Automated building of organometallic complexes from 3D fragments. ( 0,68970406884206 )
J Chem Inf Model - Experimental and computational prediction of glass transition temperature of drugs. ( 0,68144917716334 )
J Chem Inf Model - Synthesis, bioassay, and molecular field topology analysis of diverse vasodilatory heterocycles. ( 0,679906729936954 )
BMC Med Inform Decis Mak - Regression tree construction by bootstrap: model search for DRG-systems applied to Austrian health-data. ( 0,669563778084089 )
J Chem Inf Model - Beyond the scope of Free-Wilson analysis: building interpretable QSAR models with machine learning algorithms. ( 0,669090166911115 )
J Chem Inf Model - QSAR modeling of imbalanced high-throughput screening data in PubChem. ( 0,665512949599809 )
J Chem Inf Model - How accurately can we predict the melting points of drug-like compounds? ( 0,665098860301349 )
J Chem Inf Model - Time-split cross-validation as a method for estimating the goodness of prospective prediction. ( 0,663849512343429 )
J Chem Inf Model - Revisiting the general solubility equation: in silico prediction of aqueous solubility incorporating the effect of topographical polar surface area. ( 0,663704751672349 )
J Chem Inf Model - A new approach to radial basis function approximation and its application to QSAR. ( 0,663467840957273 )
J. Comput. Biol. - Biomarker discovery using statistically significant gene sets. ( 0,663466628866062 )
J Chem Inf Model - Molecular modeling of the 3D structure of 5-HT(1A)R: discovery of novel 5-HT(1A)R agonists via dynamic pharmacophore-based virtual screening. ( 0,661885207299576 )
J Chem Inf Model - Compound set enrichment: a novel approach to analysis of primary HTS data. ( 0,661871264537756 )
J Chem Inf Model - QSAR classification model for antibacterial compounds and its use in virtual screening. ( 0,661180040757779 )
J Chem Inf Model - Analysis and study of molecule data sets using snowflake diagrams of weighted maximum common subgraph trees. ( 0,660821782682653 )
J Chem Inf Model - Pharmacophore assessment through 3-D QSAR: evaluation of the predictive ability on new derivatives by the application on a series of antitubercular agents. ( 0,652452464695622 )
J Biomed Inform - Selection of interdependent genes via dynamic relevance analysis for cancer diagnosis. ( 0,651599908245871 )
J Chem Inf Model - Predictive models for cytochrome p450 isozymes based on quantitative high throughput screening data. ( 0,650961133987608 )
J Chem Inf Model - Predicting myelosuppression of drugs from in silico models. ( 0,649506052782713 )
J Chem Inf Model - Application of the 4D fingerprint method with a robust scoring function for scaffold-hopping and drug repurposing strategies. ( 0,649250543985855 )
Comput. Biol. Med. - In silico prediction of spleen tyrosine kinase inhibitors using machine learning approaches and an optimized molecular descriptor subset generated by recursive feature elimination method. ( 0,648384013030178 )
AMIA Annu Symp Proc - Effect of data combination on predictive modeling: a study using gene expression data. ( 0,647443499411907 )
J Chem Inf Model - Application of quantitative structure-activity relationship models of 5-HT1A receptor binding to virtual screening identifies novel and potent 5-HT1A ligands. ( 0,644902927799567 )
J Chem Inf Model - Kinase-kernel models: accurate in silico screening of 4 million compounds across the entire human kinome. ( 0,643901905784064 )
J Chem Inf Model - Kinome-wide activity modeling from diverse public high-quality data sets. ( 0,64329142954744 )
J Chem Inf Model - Three useful dimensions for domain applicability in QSAR models using random forest. ( 0,637736254935941 )
J Chem Inf Model - BioSM: metabolomics tool for identifying endogenous mammalian biochemical structures in chemical structure space. ( 0,636951226698143 )
Artif Intell Med - Training artificial neural networks directly on the concordance index for censored data using genetic algorithms. ( 0,635375650009819 )
J Chem Inf Model - Exploring uncharted territories: predicting activity cliffs in structure-activity landscapes. ( 0,635023938440348 )
J Chem Inf Model - Modeling drug-induced anorexia by molecular topology. ( 0,63337861785487 )
J Chem Inf Model - Four-dimensional structure-activity relationship model to predict HIV-1 integrase strand transfer inhibition using LQTA-QSAR methodology. ( 0,628489781633991 )
J Chem Inf Model - Prediction of linear cationic antimicrobial peptides based on characteristics responsible for their interaction with the membranes. ( 0,627745204355706 )
J Chem Inf Model - Prediction of compounds with closely related activity profiles using weighted support vector machine linear combinations. ( 0,62663229118821 )
Comput. Biol. Med. - Three dimensional quantitative structure-toxicity relationship modeling and prediction of acute toxicity for organic contaminants to algae. ( 0,623420460714117 )
IEEE Trans Image Process - Neighborhood Supported Model Level Fuzzy Aggregation for Moving Object Segmentation. ( 0,623257392442598 )
J Chem Inf Model - Predicting pK(a) values of substituted phenols from atomic charges: comparison of different quantum mechanical methods and charge distribution schemes. ( 0,619841183635094 )
J Chem Inf Model - Discovery and design of tricyclic scaffolds as protein kinase CK2 (CK2) inhibitors through a combination of shape-based virtual screening and structure-based molecular modification. ( 0,617591509760352 )
J Chem Inf Model - Does rational selection of training and test sets improve the outcome of QSAR modeling? ( 0,616719732949385 )
J Chem Inf Model - In silico prediction of chemical acute oral toxicity using multi-classification methods. ( 0,613512807990995 )
J Chem Inf Model - Quantitative structure-activity relationship models of clinical pharmacokinetics: clearance and volume of distribution. ( 0,613323343945636 )
J Chem Inf Model - Maximum-score diversity selection for early drug discovery. ( 0,613124571614045 )
J Chem Inf Model - Ligand and structure-based classification models for prediction of P-glycoprotein inhibitors. ( 0,612925453631296 )
Comput Methods Programs Biomed - Drug/nondrug classification using Support Vector Machines with various feature selection strategies. ( 0,611917357651109 )
J Chem Inf Model - iLOGP: a simple, robust, and efficient description of n-octanol/water partition coefficient for drug design using the GB/SA approach. ( 0,609931809616568 )
J Chem Inf Model - Leave-cluster-out cross-validation is appropriate for scoring functions derived from diverse protein data sets. ( 0,608262262822189 )
J Chem Inf Model - How experimental errors influence drug metabolism and pharmacokinetic QSAR/QSPR models. ( 0,607876191504188 )
J Chem Inf Model - FAst MEtabolizer (FAME): A rapid and accurate predictor of sites of metabolism in multiple species by endogenous enzymes. ( 0,607538115099263 )
J Chem Inf Model - Study of chromatographic retention of natural terpenoids by chemoinformatic tools. ( 0,606010624172803 )
J. Comput. Biol. - The complexity of the dirichlet model for multiple alignment data. ( 0,605713352312352 )
Methods Inf Med - Supporting regenerative medicine by integrative dimensionality reduction. ( 0,604809537322827 )
Comput. Biol. Med. - Quantification of contributions of molecular fragments for eye irritation of organic chemicals using QSAR study. ( 0,603373280212892 )
J Chem Inf Model - Combined 3D-QSAR, molecular docking, and molecular dynamics study on piperazinyl-glutamate-pyridines/pyrimidines as potent P2Y12 antagonists for inhibition of platelet aggregation. ( 0,602955833773286 )
J Chem Inf Model - Comparative studies on some metrics for external validation of QSPR models. ( 0,602510558382602 )
J Chem Inf Model - Design of novel FLT-3 inhibitors based on dual-layer 3D-QSAR model and fragment-based compounds in silico. ( 0,597740813399152 )
J Chem Inf Model - A Bayesian approach to in silico blood-brain barrier penetration modeling. ( 0,596238029383002 )
J Chem Inf Model - Chemical data visualization and analysis with incremental generative topographic mapping: big data challenge. ( 0,594676048116571 )
J Integr Bioinform - Database supported candidate search for metabolite identification. ( 0,590742630761013 )
J Chem Inf Model - Classifier ensemble based on feature selection and diversity measures for predicting the affinity of A(2B) adenosine receptor antagonists. ( 0,59052675277004 )
Artif Intell Med - Cancer survival classification using integrated data sets and intermediate information. ( 0,589856308275446 )
J Chem Inf Model - Best of both worlds: combining pharma data and state of the art modeling technology to improve in Silico pKa prediction. ( 0,588983606639588 )
J Chem Inf Model - ToxAlerts: a Web server of structural alerts for toxic chemicals and compounds with potential adverse reactions. ( 0,588812901684972 )
J Chem Inf Model - Freely available conformer generation methods: how good are they? ( 0,587415984426551 )
J Chem Inf Model - Fighting high molecular weight in bioactive molecules with sub-pharmacophore-based virtual screening. ( 0,584933242740426 )
J Chem Inf Model - Knowledge-based libraries for predicting the geometric preferences of druglike molecules. ( 0,584467935483858 )
AMIA Annu Symp Proc - Predicting the dengue incidence in Singapore using univariate time series models. ( 0,584433804264405 )
J Chem Inf Model - Applicability domain based on ensemble learning in classification and regression analyses. ( 0,583988890290166 )
J Chem Inf Model - Scaffold diversity of exemplified medicinal chemistry space. ( 0,58240209728 )
J Chem Inf Model - Development of a minimal kinase ensemble receptor (MKER) for surrogate AutoShim. ( 0,582385205485118 )
J Chem Inf Model - Quantitative structure-activity relationship models of chemical transformations from matched pairs analyses. ( 0,582367500705754 )
J Chem Inf Model - Identification of a novel inhibitor of dengue virus protease through use of a virtual screening drug discovery Web portal. ( 0,582301362527785 )
J Chem Inf Model - RS-Predictor models augmented with SMARTCyp reactivities: robust metabolic regioselectivity predictions for nine CYP isozymes. ( 0,58193318036348 )
J Chem Inf Model - Development of novel 3D-QSAR combination approach for screening and optimizing B-Raf inhibitors in silico. ( 0,580932053946014 )
J Chem Inf Model - CSAR data set release 2012: ligands, affinities, complexes, and docking decoys. ( 0,579895038203205 )