J Chem Inf Model - Binary classification of aqueous solubility using support vector machines with reduction and recombination feature selection.

Tópicos

{ model(2656) set(1616) predict(1553) }
{ featur(3375) classif(2383) classifi(1994) }
{ compound(1573) activ(1297) structur(1058) }
{ howev(809) still(633) remain(590) }
{ control(1307) perform(991) simul(935) }
{ model(2341) predict(2261) use(1141) }
{ cancer(2502) breast(956) screen(824) }
{ care(1570) inform(1187) nurs(1089) }
{ studi(2440) review(1878) systemat(933) }
{ motion(1329) object(1292) video(1091) }
{ learn(2355) train(1041) set(1003) }
{ studi(1410) differ(1259) use(1210) }
{ data(2317) use(1299) case(1017) }
{ data(1737) use(1416) pattern(1282) }
{ measur(2081) correl(1212) valu(896) }
{ sequenc(1873) structur(1644) protein(1328) }
{ take(945) account(800) differ(722) }
{ treatment(1704) effect(941) patient(846) }
{ problem(2511) optim(1539) algorithm(950) }
{ chang(1828) time(1643) increas(1301) }
{ data(1714) softwar(1251) tool(1186) }
{ design(1359) user(1324) use(1319) }
{ featur(1941) imag(1645) propos(1176) }
{ data(3963) clinic(1234) research(1004) }
{ system(1050) medic(1026) inform(1018) }
{ spatial(1525) area(1432) region(1030) }
{ model(3480) simul(1196) paramet(876) }
{ research(1218) medic(880) student(794) }
{ age(1611) year(1155) adult(843) }
{ gene(2352) biolog(1181) express(1162) }
{ use(2086) technolog(871) perceiv(783) }
{ use(976) code(926) identifi(902) }
{ use(1733) differ(960) four(931) }
{ result(1111) use(1088) new(759) }
{ implement(1333) system(1263) develop(1122) }
{ activ(1452) weight(1219) physic(1104) }
{ method(1969) cluster(1462) data(1082) }
{ model(3404) distribut(989) bayesian(671) }
{ can(774) often(719) complex(702) }
{ imag(1947) propos(1133) code(1026) }
{ inform(2794) health(2639) internet(1427) }
{ system(1976) rule(880) can(841) }
{ imag(1057) registr(996) error(939) }
{ bind(1733) structur(1185) ligand(1036) }
{ method(1219) similar(1157) match(930) }
{ imag(2830) propos(1344) filter(1198) }
{ network(2748) neural(1063) input(814) }
{ imag(2675) segment(2577) method(1081) }
{ patient(2315) diseas(1263) diabet(1191) }
{ assess(1506) score(1403) qualiti(1306) }
{ surgeri(1148) surgic(1085) robot(1054) }
{ framework(1458) process(801) describ(734) }
{ error(1145) method(1030) estim(1020) }
{ concept(1167) ontolog(924) domain(897) }
{ clinic(1479) use(1117) guidelin(835) }
{ algorithm(1844) comput(1787) effici(935) }
{ extract(1171) text(1153) clinic(932) }
{ method(1557) propos(1049) approach(1037) }
{ model(2220) cell(1177) simul(1124) }
{ general(901) number(790) one(736) }
{ method(984) reconstruct(947) comput(926) }
{ search(2224) databas(1162) retriev(909) }
{ case(1353) use(1143) diagnosi(1136) }
{ risk(3053) factor(974) diseas(938) }
{ perform(999) metric(946) measur(919) }
{ research(1085) discuss(1038) issu(1018) }
{ import(1318) role(1303) understand(862) }
{ visual(1396) interact(850) tool(830) }
{ perform(1367) use(1326) method(1137) }
{ studi(1119) effect(1106) posit(819) }
{ blood(1257) pressur(1144) flow(957) }
{ record(1888) medic(1808) patient(1693) }
{ health(3367) inform(1360) care(1135) }
{ monitor(1329) mobil(1314) devic(1160) }
{ ehr(2073) health(1662) electron(1139) }
{ state(1844) use(1261) util(961) }
{ patient(2837) hospit(1953) medic(668) }
{ medic(1828) order(1363) alert(1069) }
{ signal(2180) analysi(812) frequenc(800) }
{ cost(1906) reduc(1198) effect(832) }
{ group(2977) signific(1463) compar(1072) }
{ sampl(1606) size(1419) use(1276) }
{ data(3008) multipl(1320) sourc(1022) }
{ first(2504) two(1366) second(1323) }
{ intervent(3218) particip(2042) group(1664) }
{ activ(1138) subject(705) human(624) }
{ time(1939) patient(1703) rate(768) }
{ patient(1821) servic(1111) care(1106) }
{ can(981) present(881) function(850) }
{ analysi(2126) use(1163) compon(1037) }
{ health(1844) social(1437) communiti(874) }
{ structur(1116) can(940) graph(676) }
{ high(1669) rate(1365) level(1280) }
{ drug(1928) target(777) effect(648) }
{ survey(1388) particip(1329) question(1065) }
{ estim(2440) model(1874) function(577) }
{ decis(3086) make(1611) patient(1517) }
{ process(1125) use(805) approach(778) }
{ method(2212) result(1239) propos(1039) }
{ detect(2391) sensit(1101) algorithm(908) }

Resumo

Aqueous solubility is recognized as a critical parameter in both the early- and late-stage drug discovery. Therefore, in silico modeling of solubility has attracted extensive interests in recent years. Most previous studies have been limited in using relatively small data sets with limited diversity, which in turn limits the predictability of derived models. In this work, we present a support vector machines model for the binary classification of solubility by taking advantage of the largest known public data set that contains over 46000 compounds with experimental solubility. Our model was optimized in combination with a reduction and recombination feature selection strategy. The best model demonstrated robust performance in both cross-validation and prediction of two independent test sets, indicating it could be a practical tool to select soluble compounds for screening, purchasing, and synthesizing. Moreover, our work may be used for comparative evaluation of solubility classification studies ascribe to the use of completely public resources.

Resumo Limpo

aqueous solubl recogn critic paramet earli latestag drug discoveri therefor silico model solubl attract extens interest recent year previous studi limit use relat small data set limit divers turn limit predict deriv model work present support vector machin model binari classif solubl take advantag largest known public data set contain compound experiment solubl model optim combin reduct recombin featur select strategi best model demonstr robust perform crossvalid predict two independ test set indic practic tool select solubl compound screen purchas synthes moreov work may use compar evalu solubl classif studi ascrib use complet public resourc

Resumos Similares

J Chem Inf Model - Binary classification of a large collection of environmental chemicals from estrogen receptor assays by quantitative structure-activity relationship and machine learning methods. ( 0,754933890984733 )
J Chem Inf Model - Comparison of random forest and Pipeline Pilot Na?ve Bayes in prospective QSAR predictions. ( 0,736340997385449 )
J Chem Inf Model - Oversampling to overcome overfitting: exploring the relationship between data set composition, molecular descriptors, and predictive modeling methods. ( 0,732448065669938 )
J Chem Inf Model - Coping with unbalanced class data sets in oral absorption models. ( 0,731464860050562 )
J Chem Inf Model - In silico prediction of chemical Ames mutagenicity. ( 0,731245219436063 )
J Chem Inf Model - Experimental and computational prediction of glass transition temperature of drugs. ( 0,717350140250116 )
J Chem Inf Model - Classification of compounds with distinct or overlapping multi-target activities and diverse molecular mechanisms using emerging chemical patterns. ( 0,709992687238201 )
J Chem Inf Model - Predictions of BuChE inhibitors using support vector machine and naive Bayesian classification techniques in drug discovery. ( 0,706777176707817 )
J Chem Inf Model - Quantitative structure-activity relationship models for ready biodegradability of chemicals. ( 0,705292498082017 )
Med Biol Eng Comput - Validating motor unit firing patterns extracted by EMG signal decomposition. ( 0,694329764240521 )
J Chem Inf Model - Discovering new agents active against methicillin-resistant Staphylococcus aureus with ligand-based approaches. ( 0,682959806115485 )
AMIA Annu Symp Proc - Effect of data combination on predictive modeling: a study using gene expression data. ( 0,672280398578246 )
J Chem Inf Model - Hsp90 inhibitors, part 1: definition of 3-D QSAutogrid/R models as a tool for virtual screening. ( 0,667543283306573 )
J Chem Inf Model - Statistical analysis and compound selection of combinatorial libraries for soluble epoxide hydrolase. ( 0,66589726500893 )
BMC Med Inform Decis Mak - Regression tree construction by bootstrap: model search for DRG-systems applied to Austrian health-data. ( 0,664256748671491 )
J Chem Inf Model - Pre-processing feature selection for improved C&RT models for oral absorption. ( 0,659522583106269 )
J Chem Inf Model - In silico prediction of total human plasma clearance. ( 0,657770191277908 )
J Chem Inf Model - Pharmacophore assessment through 3-D QSAR: evaluation of the predictive ability on new derivatives by the application on a series of antitubercular agents. ( 0,656127051688336 )
J Chem Inf Model - Structure based model for the prediction of phospholipidosis induction potential of small molecules. ( 0,654181447405682 )
Comput. Biol. Med. - In silico prediction of spleen tyrosine kinase inhibitors using machine learning approaches and an optimized molecular descriptor subset generated by recursive feature elimination method. ( 0,65123509031466 )
Med Biol Eng Comput - Application of the RIMARC algorithm to a large data set of action potentials and clinical parameters for risk prediction of atrial fibrillation. ( 0,645328142144333 )
J Chem Inf Model - GA(M)E-QSAR: a novel, fully automatic genetic-algorithm-(meta)-ensembles approach for binary classification in ligand-based drug design. ( 0,640160174384855 )
J Chem Inf Model - Applicability Domain ANalysis (ADAN): a robust method for assessing the reliability of drug property predictions. ( 0,637402661392543 )
J Chem Inf Model - Jointly handling potency and toxicity of antimicrobial peptidomimetics by simple rules from desirability theory and chemoinformatics. ( 0,634532567286635 )
J Chem Inf Model - A comparison of different QSAR approaches to modeling CYP450 1A2 inhibition. ( 0,632686650452512 )
J Chem Inf Model - Design and synthesis of new antioxidants predicted by the model developed on a set of pulvinic acid derivatives. ( 0,632409502796114 )
IEEE Trans Image Process - Neighborhood Supported Model Level Fuzzy Aggregation for Moving Object Segmentation. ( 0,630562493187253 )
J Chem Inf Model - Classifier ensemble based on feature selection and diversity measures for predicting the affinity of A(2B) adenosine receptor antagonists. ( 0,627970523637459 )
J Chem Inf Model - In silico prediction of aqueous solubility using simple QSPR models: the importance of phenol and phenol-like moieties. ( 0,627738071094371 )
J Chem Inf Model - A binary ant colony optimization classifier for molecular activities. ( 0,624055098337122 )
J Chem Inf Model - Analysis and study of molecule data sets using snowflake diagrams of weighted maximum common subgraph trees. ( 0,622436262266905 )
J Chem Inf Model - A new approach to radial basis function approximation and its application to QSAR. ( 0,613608175190761 )
J Chem Inf Model - Three useful dimensions for domain applicability in QSAR models using random forest. ( 0,60776948275687 )
AMIA Annu Symp Proc - Motivating the additional use of external validity: examining transportability in a model of glioblastoma multiforme. ( 0,606800094799317 )
J Chem Inf Model - Ligand and structure-based classification models for prediction of P-glycoprotein inhibitors. ( 0,605960593498002 )
J Chem Inf Model - Beyond the scope of Free-Wilson analysis: building interpretable QSAR models with machine learning algorithms. ( 0,605189119176014 )
Artif Intell Med - Training artificial neural networks directly on the concordance index for censored data using genetic algorithms. ( 0,598364441180935 )
J Chem Inf Model - In silico prediction of chemical acute oral toxicity using multi-classification methods. ( 0,598291795310626 )
J Chem Inf Model - FAst MEtabolizer (FAME): A rapid and accurate predictor of sites of metabolism in multiple species by endogenous enzymes. ( 0,596106963095032 )
Comput. Biol. Med. - A prediction model of substrates and non-substrates of breast cancer resistance protein (BCRP) developed by GA-CG-SVM method. ( 0,593199857644941 )
J Biomed Inform - Selection of interdependent genes via dynamic relevance analysis for cancer diagnosis. ( 0,592523072404163 )
J Chem Inf Model - Four-dimensional structure-activity relationship model to predict HIV-1 integrase strand transfer inhibition using LQTA-QSAR methodology. ( 0,591717740817217 )
J Chem Inf Model - Time-split cross-validation as a method for estimating the goodness of prospective prediction. ( 0,589432539070614 )
J Chem Inf Model - Predictive models for cytochrome p450 isozymes based on quantitative high throughput screening data. ( 0,589379691754502 )
J Chem Inf Model - Profile-QSAR: a novel meta-QSAR method that combines activities across the kinase family to accurately predict affinity, selectivity, and cellular activity. ( 0,588729246805388 )
Artif Intell Med - Cancer survival classification using integrated data sets and intermediate information. ( 0,588247401402652 )
Int J Comput Assist Radiol Surg - Brain tumor classification on intraoperative contrast-enhanced ultrasound. ( 0,587685811469952 )
J Chem Inf Model - Prediction of linear cationic antimicrobial peptides based on characteristics responsible for their interaction with the membranes. ( 0,584899584513915 )
J Chem Inf Model - RS-Predictor models augmented with SMARTCyp reactivities: robust metabolic regioselectivity predictions for nine CYP isozymes. ( 0,582020634528908 )
J Chem Inf Model - Choosing feature selection and learning algorithms in QSAR. ( 0,581893067783159 )
BMC Med Inform Decis Mak - Concordance and predictive value of two adverse drug event data sets. ( 0,581397247672972 )
J Chem Inf Model - Applicability domain based on ensemble learning in classification and regression analyses. ( 0,581066609235352 )
J Med Syst - Usage of case-based reasoning, neural network and adaptive neuro-fuzzy inference system classification techniques in breast cancer dataset classification diagnosis. ( 0,58075268764813 )
J Chem Inf Model - Automated building of organometallic complexes from 3D fragments. ( 0,580385462892272 )
J Chem Inf Model - Predictive toxicology modeling: protocols for exploring hERG classification and Tetrahymena pyriformis end point predictions. ( 0,579672163614344 )
J Chem Inf Model - Prediction of compound potency changes in matched molecular pairs using support vector regression. ( 0,578063259479735 )
J Chem Inf Model - Profile-QSAR and Surrogate AutoShim protein-family modeling of proteases. ( 0,57798059570965 )
Comput. Biol. Med. - Extracting predictive SNPs in Crohn's disease using a vacillating genetic algorithm and a neural classifier in case-control association studies. ( 0,577655872818732 )
J Chem Inf Model - Does rational selection of training and test sets improve the outcome of QSAR modeling? ( 0,577188145929117 )
Neural Comput - High-dimensional cluster analysis with the masked EM algorithm. ( 0,575917558755382 )
J Chem Inf Model - iLOGP: a simple, robust, and efficient description of n-octanol/water partition coefficient for drug design using the GB/SA approach. ( 0,575008067771835 )
J Chem Inf Model - Predicting pK(a) values of substituted phenols from atomic charges: comparison of different quantum mechanical methods and charge distribution schemes. ( 0,574922026511745 )
J Chem Inf Model - Best of both worlds: combining pharma data and state of the art modeling technology to improve in Silico pKa prediction. ( 0,57455355179869 )
J Chem Inf Model - Kinome-wide activity modeling from diverse public high-quality data sets. ( 0,572505723320301 )
J Chem Inf Model - Impact of template choice on homology model efficiency in virtual screening. ( 0,572167463467753 )
J Am Med Inform Assoc - Drug repurposing: mining protozoan proteomes for targets of known bioactive compounds. ( 0,571735662807519 )
J Chem Inf Model - Exploring uncharted territories: predicting activity cliffs in structure-activity landscapes. ( 0,571046907817734 )
J Chem Inf Model - Study of chromatographic retention of natural terpenoids by chemoinformatic tools. ( 0,569891658751447 )
J Am Med Inform Assoc - Harvest: an open platform for developing web-based biomedical data discovery and reporting applications. ( 0,567451120356466 )
Int J Neural Syst - On the segmentation and classification of hand radiographs. ( 0,566658970037315 )
J Chem Inf Model - Revisiting the general solubility equation: in silico prediction of aqueous solubility incorporating the effect of topographical polar surface area. ( 0,564806263661852 )
J Chem Inf Model - Predicting myelosuppression of drugs from in silico models. ( 0,561926161840869 )
J Chem Inf Model - Prediction of compounds with closely related activity profiles using weighted support vector machine linear combinations. ( 0,561266914091007 )
J Chem Inf Model - Fusing dual-event data sets for Mycobacterium tuberculosis machine learning models and their evaluation. ( 0,557707877750547 )
J. Comput. Biol. - The complexity of the dirichlet model for multiple alignment data. ( 0,557382880680485 )
J Chem Inf Model - Development of novel 3D-QSAR combination approach for screening and optimizing B-Raf inhibitors in silico. ( 0,556683050579124 )
Int J Health Geogr - Comparative analysis of remotely-sensed data products via ecological niche modeling of avian influenza case occurrences in Middle Eastern poultry. ( 0,555529741860742 )
Lifetime Data Anal - Analysis of cure rate survival data under proportional odds model. ( 0,555179723610421 )
J Chem Inf Model - A Bayesian approach to in silico blood-brain barrier penetration modeling. ( 0,552954388052642 )
J Integr Bioinform - Classification of breast cancer subtypes by combining gene expression and DNA methylation data. ( 0,55232997967363 )
J Chem Inf Model - QSAR classification model for antibacterial compounds and its use in virtual screening. ( 0,550893625078833 )
J Chem Inf Model - GRID-based three-dimensional pharmacophores II: PharmBench, a benchmark data set for evaluating pharmacophore elucidation methods. ( 0,549205113462055 )
Comput. Biol. Med. - SVM-based feature selection to optimize sensitivity-specificity balance applied to weaning. ( 0,548531954144081 )
J Med Syst - A new data preparation method based on clustering algorithms for diagnosis systems of heart and diabetes diseases. ( 0,547188454445416 )
Comput. Biol. Med. - Decision forest for classification of gene expression data. ( 0,546279280800196 )
J Med Syst - Luminance sticker based facial expression recognition using discrete wavelet transform for physically disabled persons. ( 0,545397460173452 )
Comput Biol Chem - A protein fold classifier formed by fusing different modes of pseudo amino acid composition via PSSM. ( 0,544935020975465 )
J Chem Inf Model - How accurately can we predict the melting points of drug-like compounds? ( 0,544775905229621 )
BMC Med Inform Decis Mak - Measuring preferences for analgesic treatment for cancer pain: how do African-Americans and Whites perform on choice-based conjoint (CBC) analysis experiments? ( 0,544373722704077 )
Comput. Biol. Med. - Quantification of contributions of molecular fragments for eye irritation of organic chemicals using QSAR study. ( 0,544360588205058 )
J Chem Inf Model - A new protocol for predicting novel GSK-3? ATP competitive inhibitors. ( 0,542843527969865 )
J Med Syst - Utilization of electronic medical records to build a detection model for surveillance of healthcare-associated urinary tract infections. ( 0,540367574968989 )
J Chem Inf Model - Kinase-kernel models: accurate in silico screening of 4 million compounds across the entire human kinome. ( 0,54024351801749 )
Comput. Biol. Med. - Classification of breast regions as mass and non-mass based on digital mammograms using taxonomic indexes and SVM. ( 0,538113389829637 )
J Chem Inf Model - Construction and use of fragment-augmented molecular Hasse diagrams. ( 0,53713968604778 )
Int J Comput Assist Radiol Surg - Assessing performance in brain tumor resection using a novel virtual reality simulator. ( 0,536289205610489 )
J Chem Inf Model - QSAR modeling of imbalanced high-throughput screening data in PubChem. ( 0,535502291474302 )
J Chem Inf Model - Discovery of novel antimalarial compounds enabled by QSAR-based virtual screening. ( 0,534828187300698 )
J Med Syst - An intelligent system for lung cancer diagnosis using a new genetic algorithm based feature selection method. ( 0,533614895843926 )
J Chem Inf Model - Design of novel FLT-3 inhibitors based on dual-layer 3D-QSAR model and fragment-based compounds in silico. ( 0,533415425760338 )