J Chem Inf Model - GA(M)E-QSAR: a novel, fully automatic genetic-algorithm-(meta)-ensembles approach for binary classification in ligand-based drug design.

Tópicos

{ featur(3375) classif(2383) classifi(1994) }
{ compound(1573) activ(1297) structur(1058) }
{ learn(2355) train(1041) set(1003) }
{ model(2656) set(1616) predict(1553) }
{ framework(1458) process(801) describ(734) }
{ problem(2511) optim(1539) algorithm(950) }
{ import(1318) role(1303) understand(862) }
{ gene(2352) biolog(1181) express(1162) }
{ assess(1506) score(1403) qualiti(1306) }
{ search(2224) databas(1162) retriev(909) }
{ system(1050) medic(1026) inform(1018) }
{ use(1733) differ(960) four(931) }
{ sequenc(1873) structur(1644) protein(1328) }
{ research(1085) discuss(1038) issu(1018) }
{ perform(1367) use(1326) method(1137) }
{ activ(1452) weight(1219) physic(1104) }
{ method(2212) result(1239) propos(1039) }
{ can(774) often(719) complex(702) }
{ studi(1119) effect(1106) posit(819) }
{ group(2977) signific(1463) compar(1072) }
{ data(3008) multipl(1320) sourc(1022) }
{ structur(1116) can(940) graph(676) }
{ imag(1947) propos(1133) code(1026) }
{ data(1737) use(1416) pattern(1282) }
{ network(2748) neural(1063) input(814) }
{ studi(2440) review(1878) systemat(933) }
{ surgeri(1148) surgic(1085) robot(1054) }
{ error(1145) method(1030) estim(1020) }
{ method(984) reconstruct(947) comput(926) }
{ howev(809) still(633) remain(590) }
{ perform(999) metric(946) measur(919) }
{ spatial(1525) area(1432) region(1030) }
{ first(2504) two(1366) second(1323) }
{ intervent(3218) particip(2042) group(1664) }
{ activ(1138) subject(705) human(624) }
{ analysi(2126) use(1163) compon(1037) }
{ use(976) code(926) identifi(902) }
{ drug(1928) target(777) effect(648) }
{ implement(1333) system(1263) develop(1122) }
{ process(1125) use(805) approach(778) }
{ model(3404) distribut(989) bayesian(671) }
{ inform(2794) health(2639) internet(1427) }
{ system(1976) rule(880) can(841) }
{ measur(2081) correl(1212) valu(896) }
{ imag(1057) registr(996) error(939) }
{ bind(1733) structur(1185) ligand(1036) }
{ method(1219) similar(1157) match(930) }
{ imag(2830) propos(1344) filter(1198) }
{ imag(2675) segment(2577) method(1081) }
{ patient(2315) diseas(1263) diabet(1191) }
{ take(945) account(800) differ(722) }
{ motion(1329) object(1292) video(1091) }
{ treatment(1704) effect(941) patient(846) }
{ chang(1828) time(1643) increas(1301) }
{ concept(1167) ontolog(924) domain(897) }
{ clinic(1479) use(1117) guidelin(835) }
{ algorithm(1844) comput(1787) effici(935) }
{ extract(1171) text(1153) clinic(932) }
{ method(1557) propos(1049) approach(1037) }
{ data(1714) softwar(1251) tool(1186) }
{ design(1359) user(1324) use(1319) }
{ control(1307) perform(991) simul(935) }
{ model(2220) cell(1177) simul(1124) }
{ care(1570) inform(1187) nurs(1089) }
{ general(901) number(790) one(736) }
{ featur(1941) imag(1645) propos(1176) }
{ case(1353) use(1143) diagnosi(1136) }
{ data(3963) clinic(1234) research(1004) }
{ studi(1410) differ(1259) use(1210) }
{ risk(3053) factor(974) diseas(938) }
{ model(2341) predict(2261) use(1141) }
{ visual(1396) interact(850) tool(830) }
{ blood(1257) pressur(1144) flow(957) }
{ record(1888) medic(1808) patient(1693) }
{ health(3367) inform(1360) care(1135) }
{ model(3480) simul(1196) paramet(876) }
{ monitor(1329) mobil(1314) devic(1160) }
{ ehr(2073) health(1662) electron(1139) }
{ state(1844) use(1261) util(961) }
{ research(1218) medic(880) student(794) }
{ patient(2837) hospit(1953) medic(668) }
{ data(2317) use(1299) case(1017) }
{ age(1611) year(1155) adult(843) }
{ medic(1828) order(1363) alert(1069) }
{ signal(2180) analysi(812) frequenc(800) }
{ cost(1906) reduc(1198) effect(832) }
{ sampl(1606) size(1419) use(1276) }
{ time(1939) patient(1703) rate(768) }
{ patient(1821) servic(1111) care(1106) }
{ use(2086) technolog(871) perceiv(783) }
{ can(981) present(881) function(850) }
{ health(1844) social(1437) communiti(874) }
{ high(1669) rate(1365) level(1280) }
{ cancer(2502) breast(956) screen(824) }
{ result(1111) use(1088) new(759) }
{ survey(1388) particip(1329) question(1065) }
{ estim(2440) model(1874) function(577) }
{ decis(3086) make(1611) patient(1517) }
{ method(1969) cluster(1462) data(1082) }
{ detect(2391) sensit(1101) algorithm(908) }

Resumo

Computer-aided drug design has become an important component of the drug discovery process. Despite the advances in this field, there is not a unique modeling approach that can be successfully applied to solve the whole range of problems faced during QSAR modeling. Feature selection and ensemble modeling are active areas of research in ligand-based drug design. Here we introduce the GA(M)E-QSAR algorithm that combines the search and optimization capabilities of Genetic Algorithms with the simplicity of the Adaboost ensemble-based classification algorithm to solve binary classification problems. We also explore the usefulness of Meta-Ensembles trained with Adaboost and Voting schemes to further improve the accuracy, generalization, and robustness of the optimal Adaboost Single Ensemble derived from the Genetic Algorithm optimization. We evaluated the performance of our algorithm using five data sets from the literature and found that it is capable of yielding similar or better classification results to what has been reported for these data sets with a higher enrichment of active compounds relative to the whole actives subset when only the most active chemicals are considered. More important, we compared our methodology with state of the art feature selection and classification approaches and found that it can provide highly accurate, robust, and generalizable models. In the case of the Adaboost Ensembles derived from the Genetic Algorithm search, the final models are quite simple since they consist of a weighted sum of the output of single feature classifiers. Furthermore, the Adaboost scores can be used as ranking criterion to prioritize chemicals for synthesis and biological evaluation after virtual screening experiments.

Resumo Limpo

computeraid drug design becom import compon drug discoveri process despit advanc field uniqu model approach can success appli solv whole rang problem face qsar model featur select ensembl model activ area research ligandbas drug design introduc gameqsar algorithm combin search optim capabl genet algorithm simplic adaboost ensemblebas classif algorithm solv binari classif problem also explor use metaensembl train adaboost vote scheme improv accuraci general robust optim adaboost singl ensembl deriv genet algorithm optim evalu perform algorithm use five data set literatur found capabl yield similar better classif result report data set higher enrich activ compound relat whole activ subset activ chemic consid import compar methodolog state art featur select classif approach found can provid high accur robust generaliz model case adaboost ensembl deriv genet algorithm search final model quit simpl sinc consist weight sum output singl featur classifi furthermor adaboost score can use rank criterion priorit chemic synthesi biolog evalu virtual screen experi

Resumos Similares

J Chem Inf Model - Binary classification of a large collection of environmental chemicals from estrogen receptor assays by quantitative structure-activity relationship and machine learning methods. ( 0,752245904361356 )
J Chem Inf Model - Oversampling to overcome overfitting: exploring the relationship between data set composition, molecular descriptors, and predictive modeling methods. ( 0,730420892979734 )
J Chem Inf Model - A binary ant colony optimization classifier for molecular activities. ( 0,698047581258906 )
J Chem Inf Model - In silico prediction of chemical Ames mutagenicity. ( 0,687551582920017 )
J Chem Inf Model - Classification of compounds with distinct or overlapping multi-target activities and diverse molecular mechanisms using emerging chemical patterns. ( 0,683986190903401 )
J Chem Inf Model - Pre-processing feature selection for improved C&RT models for oral absorption. ( 0,683179788097548 )
J Chem Inf Model - Structure based model for the prediction of phospholipidosis induction potential of small molecules. ( 0,67455150097285 )
J Chem Inf Model - Discovering new agents active against methicillin-resistant Staphylococcus aureus with ligand-based approaches. ( 0,671600536026492 )
J Chem Inf Model - Classifying large chemical data sets: using a regularized potential function method. ( 0,668413508039705 )
J Chem Inf Model - Jointly handling potency and toxicity of antimicrobial peptidomimetics by simple rules from desirability theory and chemoinformatics. ( 0,667802426548645 )
J Chem Inf Model - Predictions of BuChE inhibitors using support vector machine and naive Bayesian classification techniques in drug discovery. ( 0,666760715666995 )
Comput. Biol. Med. - In silico prediction of spleen tyrosine kinase inhibitors using machine learning approaches and an optimized molecular descriptor subset generated by recursive feature elimination method. ( 0,665347207061175 )
J Chem Inf Model - SVM classification and CoMSIA modeling of UGT1A6 interacting molecules. ( 0,660221174492688 )
J Chem Inf Model - Classifying molecules using a sparse probabilistic kernel binary classifier. ( 0,656036589049478 )
J Chem Inf Model - Coping with unbalanced class data sets in oral absorption models. ( 0,655671421802934 )
J Chem Inf Model - In silico prediction of chemical acute oral toxicity using multi-classification methods. ( 0,645910433141996 )
J Chem Inf Model - Prediction of activity cliffs using support vector machines. ( 0,642283937432145 )
J Integr Bioinform - Modelling proteolytic enzymes with Support Vector Machines. ( 0,641250211754394 )
Comput. Biol. Med. - Decision forest for classification of gene expression data. ( 0,640969242110833 )
J Chem Inf Model - Binary classification of aqueous solubility using support vector machines with reduction and recombination feature selection. ( 0,640160174384854 )
J Chem Inf Model - Modeling and benchmark data set for the inhibition of c-Jun N-terminal kinase-3. ( 0,639367675190626 )
Comput. Biol. Med. - Extracting predictive SNPs in Crohn's disease using a vacillating genetic algorithm and a neural classifier in case-control association studies. ( 0,636205678845251 )
Comput Methods Programs Biomed - Drug/nondrug classification using Support Vector Machines with various feature selection strategies. ( 0,636013216342343 )
Comput Methods Programs Biomed - A heuristic biomarker selection approach based on professional tennis player ranking strategy. ( 0,633526211164135 )
J Chem Inf Model - Prediction of aquatic toxicity mode of action using linear discriminant and random forest models. ( 0,629808407811459 )
J Chem Inf Model - Comparison of confirmed inactive and randomly selected compounds as negative training examples in support vector machine-based virtual screening. ( 0,627591690209756 )
J Chem Inf Model - Design of combinatorial libraries for the exploration of virtual hits from fragment space searches with LoFT. ( 0,626704478186606 )
IEEE Trans Pattern Anal Mach Intell - Feature Selection with Conjunctions of Decision Stumps and Learning from Microarray Data. ( 0,624603872686885 )
IEEE J Biomed Health Inform - Multiple kernel learning in the primal for multimodal Alzheimer's disease classification. ( 0,624150965962691 )
J Chem Inf Model - Profile-QSAR and Surrogate AutoShim protein-family modeling of proteases. ( 0,623379171766641 )
J Chem Inf Model - Modeling drug-induced anorexia by molecular topology. ( 0,619327454996721 )
J Chem Inf Model - Compound set enrichment: a novel approach to analysis of primary HTS data. ( 0,618922284888465 )
J Chem Inf Model - A comparison of different QSAR approaches to modeling CYP450 1A2 inhibition. ( 0,617414994610855 )
J Chem Inf Model - Construction and use of fragment-augmented molecular Hasse diagrams. ( 0,614364926182462 )
J Chem Inf Model - Classifier ensemble based on feature selection and diversity measures for predicting the affinity of A(2B) adenosine receptor antagonists. ( 0,613486836602765 )
J. Med. Internet Res. - Web-based newborn screening system for metabolic diseases: machine learning versus clinicians. ( 0,611718585728083 )
IEEE Trans Neural Netw Learn Syst - ML-Tree: a tree-structure-based approach to multilabel learning. ( 0,611113203738803 )
J Chem Inf Model - Discovery of novel antimalarial compounds enabled by QSAR-based virtual screening. ( 0,609774590100173 )
J Chem Inf Model - Ligand and structure-based classification models for prediction of P-glycoprotein inhibitors. ( 0,609117764039388 )
J Chem Inf Model - Profile-QSAR: a novel meta-QSAR method that combines activities across the kinase family to accurately predict affinity, selectivity, and cellular activity. ( 0,607872608549034 )
J Chem Inf Model - Application of support vector machine to three-dimensional shape-based virtual screening using comprehensive three-dimensional molecular shape overlay with known inhibitors. ( 0,606404088516898 )
J Chem Inf Model - In silico assessment of chemical biodegradability. ( 0,604946985001598 )
Comput Math Methods Med - Comparison of two methods forecasting binding rate of plasma protein. ( 0,603881489973197 )
J Chem Inf Model - Hsp90 inhibitors, part 1: definition of 3-D QSAutogrid/R models as a tool for virtual screening. ( 0,602187323013889 )
J. Comput. Biol. - Biomarker discovery using statistically significant gene sets. ( 0,601220265376621 )
Comput. Biol. Med. - Relabeling algorithm for retrieval of noisy instances and improving prediction quality. ( 0,60081166853862 )
AMIA Annu Symp Proc - Predicting discharge mortality after acute ischemic stroke using balanced data. ( 0,600778656001517 )
Methods Inf Med - Supporting regenerative medicine by integrative dimensionality reduction. ( 0,599712806332185 )
J Chem Inf Model - Conditional probabilities of activity landscape features for individual compounds. ( 0,599513058101306 )
J Chem Inf Model - Quantitative structure-activity relationship models for ready biodegradability of chemicals. ( 0,598723004098671 )
J Chem Inf Model - QSAR classification model for antibacterial compounds and its use in virtual screening. ( 0,597661998022374 )
J Chem Inf Model - In silico prediction of total human plasma clearance. ( 0,597009557252273 )
Neural Comput - High-dimensional cluster analysis with the masked EM algorithm. ( 0,595845632071345 )
J Chem Inf Model - Predictive models for cytochrome p450 isozymes based on quantitative high throughput screening data. ( 0,595741864142419 )
J Chem Inf Model - Revisiting the general solubility equation: in silico prediction of aqueous solubility incorporating the effect of topographical polar surface area. ( 0,592878318375732 )
Comput Math Methods Med - Mixed-norm regularization for brain decoding. ( 0,591479705777768 )
J Chem Inf Model - Prediction of chemical biodegradability using support vector classifier optimized with differential evolution. ( 0,589571973428179 )
J Chem Inf Model - Exploring uncharted territories: predicting activity cliffs in structure-activity landscapes. ( 0,588168916624739 )
J Chem Inf Model - Introduction of a methodology for visualization and graphical interpretation of Bayesian classification models. ( 0,586774048737038 )
Comput Methods Programs Biomed - A random forest classifier for lymph diseases. ( 0,586455760288616 )
Artif Intell Med - Supervised machine learning-based classification of oral malodor based on the microbiota in saliva samples. ( 0,585603729587118 )
J Chem Inf Model - Prediction of compounds with closely related activity profiles using weighted support vector machine linear combinations. ( 0,585183046282704 )
J Chem Inf Model - Synthesis, bioassay, and molecular field topology analysis of diverse vasodilatory heterocycles. ( 0,584735183932019 )
J Chem Inf Model - How do 2D fingerprints detect structurally diverse active compounds? Revealing compound subset-specific fingerprint features through systematic selection. ( 0,584657958850601 )
J Am Med Inform Assoc - Drug repurposing: mining protozoan proteomes for targets of known bioactive compounds. ( 0,583532208538376 )
J Chem Inf Model - Target-independent prediction of drug synergies using only drug lipophilicity. ( 0,582176210481924 )
J Chem Inf Model - Applicability Domain ANalysis (ADAN): a robust method for assessing the reliability of drug property predictions. ( 0,580079025704302 )
J Integr Bioinform - Reducing the n-gram feature space of class C GPCRs to subtype-discriminating patterns. ( 0,578736341316991 )
Comput. Biol. Med. - SVM-based feature selection to optimize sensitivity-specificity balance applied to weaning. ( 0,57820468553041 )
IEEE Trans Image Process - A unified feature and instance selection framework using optimum experimental design. ( 0,577766903402488 )
J Med Syst - A three-stage expert system based on support vector machines for thyroid disease diagnosis. ( 0,575619966786664 )
Comput Biol Chem - Multi objective SNP selection using pareto optimality. ( 0,574621484364473 )
J Chem Inf Model - Prediction of linear cationic antimicrobial peptides based on characteristics responsible for their interaction with the membranes. ( 0,574512072628214 )
J Chem Inf Model - Large-scale learning of structure-activity relationships using a linear support vector machine and problem-specific metrics. ( 0,573136899324357 )
J Biomed Inform - A biological continuum based approach for efficient clinical classification. ( 0,572995761497586 )
J Chem Inf Model - Design and synthesis of new antioxidants predicted by the model developed on a set of pulvinic acid derivatives. ( 0,572442089811619 )
J Chem Inf Model - Application of the 4D fingerprint method with a robust scoring function for scaffold-hopping and drug repurposing strategies. ( 0,572343743980127 )
J Chem Inf Model - Knowledge-based libraries for predicting the geometric preferences of druglike molecules. ( 0,571996657147122 )
J Biomed Inform - An efficient statistical feature selection approach for classification of gene expression data. ( 0,570760096722819 )
J Chem Inf Model - Characterizing the diversity and biological relevance of the MLPCN assay manifold and screening set. ( 0,570472257424696 )
J Chem Inf Model - Cross-target view to feature selection: identification of molecular interaction features in ligand-target space. ( 0,570082979044644 )
J Integr Bioinform - On the parameter optimization of Support Vector Machines for binary classification. ( 0,570059737520232 )
J Chem Inf Model - A new protocol for predicting novel GSK-3? ATP competitive inhibitors. ( 0,569891849127445 )
Comput. Biol. Med. - Gene expression data classification using locally linear discriminant embedding. ( 0,569781472465584 )
Comput Biol Chem - Information-theoretic approaches to SVM feature selection for metagenome read classification. ( 0,56934767810263 )
Comput Methods Programs Biomed - An attribute weight assignment and particle swarm optimization algorithm for medical database classifications. ( 0,568774733865739 )
Comput Biol Chem - CE-PLoc: an ensemble classifier for predicting protein subcellular locations by fusing different modes of pseudo amino acid composition. ( 0,566931070251752 )
J Chem Inf Model - Experimental and computational prediction of glass transition temperature of drugs. ( 0,56609317629737 )
J Chem Inf Model - Do not hesitate to use Tversky-and other hints for successful active analogue searches with feature count descriptors. ( 0,565594913033474 )
Comput Math Methods Med - Discrimination between Alzheimer's disease and mild cognitive impairment using SOM and PSO-SVM. ( 0,565475836947137 )
J Biomed Inform - A medical diagnostic tool based on radial basis function classifiers and evolutionary simulated annealing. ( 0,564699067895694 )
Artif Intell Med - Texture feature ranking with relevance learning to classify interstitial lung disease patterns. ( 0,564685299317966 )
Comput. Biol. Med. - Classification of EMG signals using PSO optimized SVM for diagnosis of neuromuscular disorders. ( 0,564097729139315 )
J Chem Inf Model - LiCABEDS II. Modeling of ligand selectivity for G-protein-coupled cannabinoid receptors. ( 0,563975606419571 )
J Chem Inf Model - Classification of cytochrome P450 inhibitors and noninhibitors using combined classifiers. ( 0,563693993774475 )
J Chem Inf Model - Statistical analysis and compound selection of combinatorial libraries for soluble epoxide hydrolase. ( 0,563402822109258 )
Artif Intell Med - Screening nonrandomized studies for medical systematic reviews: a comparative study of classifiers. ( 0,562560257570991 )
IEEE J Biomed Health Inform - Automatic detection of atrial fibrillation in cardiac vibration signals. ( 0,562092702859878 )
J Chem Inf Model - BioSM: metabolomics tool for identifying endogenous mammalian biochemical structures in chemical structure space. ( 0,561677481924963 )
Brief. Bioinformatics - Ensemble learning algorithms for classification of mtDNA into haplogroups. ( 0,561667082276371 )