J Chem Inf Model - Are bigger data sets better for machine learning? Fusing single-point and dual-event dose response data for Mycobacterium tuberculosis.

Tópicos

{ model(2341) predict(2261) use(1141) }
{ learn(2355) train(1041) set(1003) }
{ compound(1573) activ(1297) structur(1058) }
{ model(2656) set(1616) predict(1553) }
{ gene(2352) biolog(1181) express(1162) }
{ assess(1506) score(1403) qualiti(1306) }
{ howev(809) still(633) remain(590) }
{ perform(1367) use(1326) method(1137) }
{ data(1737) use(1416) pattern(1282) }
{ clinic(1479) use(1117) guidelin(835) }
{ group(2977) signific(1463) compar(1072) }
{ activ(1138) subject(705) human(624) }
{ can(774) often(719) complex(702) }
{ method(1219) similar(1157) match(930) }
{ data(1714) softwar(1251) tool(1186) }
{ studi(1119) effect(1106) posit(819) }
{ can(981) present(881) function(850) }
{ cancer(2502) breast(956) screen(824) }
{ method(1969) cluster(1462) data(1082) }
{ model(3404) distribut(989) bayesian(671) }
{ measur(2081) correl(1212) valu(896) }
{ take(945) account(800) differ(722) }
{ motion(1329) object(1292) video(1091) }
{ chang(1828) time(1643) increas(1301) }
{ design(1359) user(1324) use(1319) }
{ method(984) reconstruct(947) comput(926) }
{ studi(1410) differ(1259) use(1210) }
{ research(1085) discuss(1038) issu(1018) }
{ model(3480) simul(1196) paramet(876) }
{ patient(2837) hospit(1953) medic(668) }
{ first(2504) two(1366) second(1323) }
{ time(1939) patient(1703) rate(768) }
{ high(1669) rate(1365) level(1280) }
{ use(976) code(926) identifi(902) }
{ activ(1452) weight(1219) physic(1104) }
{ imag(1947) propos(1133) code(1026) }
{ inform(2794) health(2639) internet(1427) }
{ system(1976) rule(880) can(841) }
{ imag(1057) registr(996) error(939) }
{ bind(1733) structur(1185) ligand(1036) }
{ sequenc(1873) structur(1644) protein(1328) }
{ featur(3375) classif(2383) classifi(1994) }
{ imag(2830) propos(1344) filter(1198) }
{ network(2748) neural(1063) input(814) }
{ imag(2675) segment(2577) method(1081) }
{ patient(2315) diseas(1263) diabet(1191) }
{ studi(2440) review(1878) systemat(933) }
{ treatment(1704) effect(941) patient(846) }
{ surgeri(1148) surgic(1085) robot(1054) }
{ framework(1458) process(801) describ(734) }
{ problem(2511) optim(1539) algorithm(950) }
{ error(1145) method(1030) estim(1020) }
{ concept(1167) ontolog(924) domain(897) }
{ algorithm(1844) comput(1787) effici(935) }
{ extract(1171) text(1153) clinic(932) }
{ method(1557) propos(1049) approach(1037) }
{ control(1307) perform(991) simul(935) }
{ model(2220) cell(1177) simul(1124) }
{ care(1570) inform(1187) nurs(1089) }
{ general(901) number(790) one(736) }
{ search(2224) databas(1162) retriev(909) }
{ featur(1941) imag(1645) propos(1176) }
{ case(1353) use(1143) diagnosi(1136) }
{ data(3963) clinic(1234) research(1004) }
{ risk(3053) factor(974) diseas(938) }
{ perform(999) metric(946) measur(919) }
{ system(1050) medic(1026) inform(1018) }
{ import(1318) role(1303) understand(862) }
{ visual(1396) interact(850) tool(830) }
{ blood(1257) pressur(1144) flow(957) }
{ spatial(1525) area(1432) region(1030) }
{ record(1888) medic(1808) patient(1693) }
{ health(3367) inform(1360) care(1135) }
{ monitor(1329) mobil(1314) devic(1160) }
{ ehr(2073) health(1662) electron(1139) }
{ state(1844) use(1261) util(961) }
{ research(1218) medic(880) student(794) }
{ data(2317) use(1299) case(1017) }
{ age(1611) year(1155) adult(843) }
{ medic(1828) order(1363) alert(1069) }
{ signal(2180) analysi(812) frequenc(800) }
{ cost(1906) reduc(1198) effect(832) }
{ sampl(1606) size(1419) use(1276) }
{ data(3008) multipl(1320) sourc(1022) }
{ intervent(3218) particip(2042) group(1664) }
{ patient(1821) servic(1111) care(1106) }
{ use(2086) technolog(871) perceiv(783) }
{ analysi(2126) use(1163) compon(1037) }
{ health(1844) social(1437) communiti(874) }
{ structur(1116) can(940) graph(676) }
{ use(1733) differ(960) four(931) }
{ drug(1928) target(777) effect(648) }
{ result(1111) use(1088) new(759) }
{ implement(1333) system(1263) develop(1122) }
{ survey(1388) particip(1329) question(1065) }
{ estim(2440) model(1874) function(577) }
{ decis(3086) make(1611) patient(1517) }
{ process(1125) use(805) approach(778) }
{ method(2212) result(1239) propos(1039) }
{ detect(2391) sensit(1101) algorithm(908) }

Resumo

Tuberculosis is a major, neglected disease for which the quest to find new treatments continues. There is an abundance of data from large phenotypic screens in the public domain against Mycobacterium tuberculosis (Mtb). Since machine learning methods can learn from past data, we were interested in addressing whether more data builds better models. We now describe using Bayesian machine learning to assess whether we can improve our models by combining the large quantities of single-point data with the much smaller (higher quality) dual-event data sets, which use both dose-response data for both whole-cell antitubercular activity and Vero cell cytotoxicity. We have evaluated 12 models ranging from different single-point, dual-event dose-response, single-point and dual-event dose-response as well as combined data sets for three distinct data sets from the same laboratory. We used a fourth data set of active and inactive compounds from the same group as well as a smaller set of 177 active compounds from GlaxoSmithKline as test sets. Our data suggest combining single-point with dual-event dose-response data does not diminish the internal or external predictive ability of the models based on the receiver operator curve (ROC) for these models (internal ROC range 0.83-0.91, external ROC range 0.62-0.83) compared to the orders of magnitude smaller dual-event models (internal ROC range 0.6-0.83 and external ROC 0.54-0.83). In conclusion, models developed with 1200-5000 compounds appear to be as predictive as those generated with 25000-350000 molecules. Our results have implications for justifying further high-throughput screening versus focused testing based on model predictions.

Resumo Limpo

tuberculosi major neglect diseas quest find new treatment continu abund data larg phenotyp screen public domain mycobacterium tuberculosi mtb sinc machin learn method can learn past data interest address whether data build better model now describ use bayesian machin learn assess whether can improv model combin larg quantiti singlepoint data much smaller higher qualiti dualev data set use doserespons data wholecel antitubercular activ vero cell cytotox evalu model rang differ singlepoint dualev doserespons singlepoint dualev doserespons well combin data set three distinct data set laboratori use fourth data set activ inact compound group well smaller set activ compound glaxosmithklin test set data suggest combin singlepoint dualev doserespons data diminish intern extern predict abil model base receiv oper curv roc model intern roc rang extern roc rang compar order magnitud smaller dualev model intern roc rang extern roc conclus model develop compound appear predict generat molecul result implic justifi highthroughput screen versus focus test base model predict

Resumos Similares

J Am Med Inform Assoc - Supervised embedding of textual predictors with applications in clinical diagnostics for pediatric cardiology. ( 0,813998237185008 )
Appl Clin Inform - Comparing predictions made by a prediction model, clinical score, and physicians: pediatric asthma exacerbations in the emergency department. ( 0,813761311439017 )
BMC Med Inform Decis Mak - Artificial neural network models for prediction of cardiovascular autonomic dysfunction in general Chinese population. ( 0,804820499399371 )
J Clin Monit Comput - Use of genetic programming, logistic regression, and artificial neural nets to predict readmission after coronary artery bypass surgery. ( 0,80202798266048 )
Comput Math Methods Med - Modified logistic regression models using gene coexpression and clinical features to predict prostate cancer progression. ( 0,801247038266197 )
J Am Med Inform Assoc - An improved model for predicting postoperative nausea and vomiting in ambulatory surgery patients using physician-modifiable risk factors. ( 0,796471445123396 )
Comput Math Methods Med - Variable selection in ROC regression. ( 0,79552063085319 )
Med Decis Making - Performance of a mathematical model to forecast lives saved from HIV treatment expansion in resource-limited settings. ( 0,788079833784706 )
Int J Health Geogr - Prediction of high-risk areas for visceral leishmaniasis using socioeconomic indicators and remote sensing data. ( 0,776301395670962 )
J. Comput. Biol. - Prediction of siRNA potency using sparse logistic regression. ( 0,775017005067436 )
J Biomed Inform - Partial least squares and logistic regression random-effects estimates for gene selection in supervised classification of gene expression data. ( 0,771515597985073 )
BMC Med Inform Decis Mak - Prediction of axillary lymph node metastasis in primary breast cancer patients using a decision tree-based model. ( 0,769640171610412 )
Comput Math Methods Med - Screening for prediabetes using machine learning models. ( 0,766396516830527 )
Artif Intell Med - Machine learning of clinical performance in a pancreatic cancer database. ( 0,765618562934463 )
Comput. Biol. Med. - A knowledge-driven probabilistic framework for the prediction of protein-protein interaction networks. ( 0,765524016328297 )
J Chem Inf Model - Ligand efficiency-based support vector regression models for predicting bioactivities of ligands to drug target proteins. ( 0,765248021417814 )
Comput Methods Programs Biomed - Single stage and multistage classification models for the prediction of liver fibrosis degree in patients with chronic hepatitis C infection. ( 0,763569158822835 )
Artif Intell Med - Prediction of human major histocompatibility complex class II binding peptides by continuous kernel discrimination method. ( 0,757543797550438 )
Med Decis Making - Application of an artificial neural network to predict postinduction hypotension during general anesthesia. ( 0,757509569213797 )
Lifetime Data Anal - Understanding increments in model performance metrics. ( 0,757384224098219 )
BMC Med Inform Decis Mak - A three-step approach for the derivation and validation of high-performing predictive models using an operational dataset: congestive heart failure readmission case study. ( 0,756084997386984 )
J Biomed Inform - Decision-making model for early diagnosis of congestive heart failure using rough set and decision tree approaches. ( 0,756057299514116 )
J Chem Inf Model - Pragmatic approaches to using computational methods to predict xenobiotic metabolism. ( 0,747624618203426 )
Med Decis Making - Adaptation of clinical prediction models for application in local settings. ( 0,746691455386599 )
AMIA Annu Symp Proc - Predicting Surgical Risk: How Much Data is Enough? ( 0,74498383986432 )
Comput. Biol. Med. - A ternary model of decompression sickness in rats. ( 0,744884386637815 )
J Chem Inf Model - Two new parameters based on distances in a receiver operating characteristic chart for the selection of classification models. ( 0,737703551565537 )
BMC Med Inform Decis Mak - Evaluation of prediction models for the staging of prostate cancer. ( 0,737213944563939 )
J Chem Inf Model - Predictive toxicology modeling: protocols for exploring hERG classification and Tetrahymena pyriformis end point predictions. ( 0,735236639351643 )
Int J Med Inform - Application of data mining to the identification of critical factors in patient falls using a web-based reporting system. ( 0,734046317612907 )
Med Decis Making - Constructing proper ROCs from ordinal response data using weighted power functions. ( 0,733189075766612 )
J Biomed Inform - Statistical process control for validating a classification tree model for predicting mortality--a novel approach towards temporal validation. ( 0,72940035213064 )
Neural Comput - An extension of the receiver operating characteristic curve and AUC-optimal classification. ( 0,72678223993512 )
Comput Methods Programs Biomed - Development of a daily mortality probability prediction model from Intensive Care Unit patients using a discrete-time event history analysis. ( 0,725903219355699 )
Med Decis Making - A comparison of methods for converting DCE values onto the full health-dead QALY scale. ( 0,721423980833799 )
J Am Med Inform Assoc - Machine learning for predicting the response of breast cancer to neoadjuvant chemotherapy. ( 0,711668760420009 )
Methods Inf Med - Limited sampling strategies to estimate the area under the concentration-time curve. Biases and a proposed more accurate method. ( 0,711292495319431 )
IEEE J Biomed Health Inform - The effect of sample age and prediction resolution on myocardial infarction risk prediction. ( 0,707304854279862 )
J Med Syst - Classifying hospitals as mortality outliers: logistic versus hierarchical logistic models. ( 0,704535842558843 )
Med Decis Making - The Impact of Oversampling with SMOTE on the Performance of 3 Classifiers in Prediction of Type 2 Diabetes. ( 0,703957303761482 )
BMC Med Inform Decis Mak - Bayesian predictors of very poor health related quality of life and mortality in patients with COPD. ( 0,702905701027671 )
BMC Med Inform Decis Mak - Mining geriatric assessment data for in-patient fall prediction models and high-risk subgroups. ( 0,701483060050926 )
BMC Med Inform Decis Mak - Prediction of adverse cardiac events in emergency department patients with chest pain using machine learning for variable selection. ( 0,697945046962081 )
J Med Syst - Effective automated prediction of vertebral column pathologies based on logistic model tree with SMOTE preprocessing. ( 0,694825752248855 )
IEEE Trans Image Process - Network-based H.264/AVC whole frame loss visibility model and frame dropping methods. ( 0,691031389370464 )
BMC Med Inform Decis Mak - Non-linear dynamical signal characterization for prediction of defibrillation success through machine learning. ( 0,689468143131833 )
J Chem Inf Model - Experimental and computational prediction of glass transition temperature of drugs. ( 0,688007178866454 )
J Med Syst - A new approach: role of data mining in prediction of survival of burn patients. ( 0,682682191073125 )
J Chem Inf Model - Using information from historical high-throughput screens to predict active compounds. ( 0,67763798300475 )
Appl Clin Inform - Exploring the value of clinical data standards to predict hospitalization of home care patients. ( 0,676087364299795 )
J Chem Inf Model - FAst MEtabolizer (FAME): A rapid and accurate predictor of sites of metabolism in multiple species by endogenous enzymes. ( 0,675749604293289 )
Brief. Bioinformatics - Added predictive value of high-throughput molecular data to clinical data and its validation. ( 0,672984640163283 )
J Am Med Inform Assoc - Automating annotation of information-giving for analysis of clinical conversation. ( 0,668315705873119 )
Comput Biol Chem - An ensemble method for prediction of conformational B-cell epitopes from antigen sequences. ( 0,667509650000285 )
Comput. Biol. Med. - Pre-operative prediction of surgical morbidity in children: comparison of five statistical models. ( 0,667440662647894 )
Comput Methods Programs Biomed - Exploring an optimal vector autoregressive model for multi-channel pulmonary sound data. ( 0,666353811020912 )
J Chem Inf Model - Elaborate ligand-based modeling coupled with multiple linear regression and k nearest neighbor QSAR analyses unveiled new nanomolar mTOR inhibitors. ( 0,665480994113669 )
BMC Med Inform Decis Mak - Computerized prediction of intensive care unit discharge after cardiac surgery: development and validation of a Gaussian processes model. ( 0,662841312890858 )
J Biomed Inform - Prediction of influenza vaccination outcome by neural networks and logistic regression. ( 0,662725643530093 )
J Chem Inf Model - Capturing the crystal: prediction of enthalpy of sublimation, crystal lattice energy, and melting points of organic compounds. ( 0,660738457516167 )
Comput Methods Programs Biomed - Prediction of postprandial blood glucose under uncertainty and intra-patient variability in type 1 diabetes: a comparative study of three interval models. ( 0,659166438307527 )
BMC Med Inform Decis Mak - Use of outcomes to evaluate surveillance systems for bioterrorist attacks. ( 0,657389998943621 )
Med Decis Making - Performance profiling in primary care: does the choice of statistical model matter? ( 0,656521676580462 )
AMIA Annu Symp Proc - Development and implementation of a real-time 30-day readmission predictive model. ( 0,655232303703533 )
J Biomed Inform - Gene-disease association with literature based enrichment. ( 0,654579395231677 )
J Chem Inf Model - Ligand-based virtual screening approach using a new scoring function. ( 0,653085615455102 )
Spat Spatiotemporal Epidemiol - Assessment of land use factors associated with dengue cases in Malaysia using Boosted Regression Trees. ( 0,651715640831264 )
Comput. Biol. Med. - A leave-one-out cross-validation SAS macro for the identification of markers associated with survival. ( 0,650228683277286 )
Comput Math Methods Med - Iterative reweighted noninteger norm regularizing SVM for gene expression data classification. ( 0,649505210378355 )
J Biomed Inform - Not just data: a method for improving prediction with knowledge. ( 0,649126762198636 )
IEEE Trans Image Process - Image annotation by input-output structural grouping sparsity. ( 0,647991390858292 )
Methods Inf Med - Sensor-based fall risk assessment--an expert 'to go'. ( 0,647607930048158 )
AMIA Annu Symp Proc - Developing predictive models using electronic medical records: challenges and pitfalls. ( 0,646865795100277 )
IEEE Trans Image Process - Monotonic regression: a new way for correlating subjective and objective ratings in image quality research. ( 0,64658214804753 )
J Med Syst - Comparison of artificial neural networks with logistic regression for detection of obesity. ( 0,645705649153084 )
J Biomed Inform - Synergistic effect of different levels of genomic data for cancer clinical outcome prediction. ( 0,645303056056298 )
J Chem Inf Model - Comparison of random forest and Pipeline Pilot Na?ve Bayes in prospective QSAR predictions. ( 0,643769319857833 )
Comput Math Methods Med - Prediction of BP reactivity to talking using hybrid soft computing approaches. ( 0,642324990645906 )
Med Biol Eng Comput - System identification of the mechanomyogram from single motor units during voluntary isometric contraction. ( 0,641680760877478 )
J Chem Inf Model - Training based on ligand efficiency improves prediction of bioactivities of ligands and drug target proteins in a machine learning approach. ( 0,6399622587225 )
BMC Med Inform Decis Mak - Risk factors for adverse reactions from contrast agents for computed tomography. ( 0,639383317591376 )
Brief. Bioinformatics - Caveats and pitfalls of ROC analysis in clinical microarray research (and how to avoid them). ( 0,634560041810715 )
Med Decis Making - Lehmann family of ROC curves. ( 0,6319807105776 )
Comput Methods Programs Biomed - Recurrence predictive models for patients with hepatocellular carcinoma after radiofrequency ablation using support vector machines with feature selection methods. ( 0,631395455012711 )
Artif Intell Med - Improved modeling of clinical data with kernel methods. ( 0,630511496079365 )
Methods Inf Med - Classification of postural profiles among mouth-breathing children by learning vector quantization. ( 0,629930718921015 )
Int J Health Geogr - Ecological niche model of Phlebotomus alexandri and P. papatasi (Diptera: Psychodidae) in the Middle East. ( 0,628756579885015 )
BMC Med Inform Decis Mak - Decision curve analysis revisited: overall net benefit, relationships to ROC curve analysis, and application to case-control studies. ( 0,628615448145861 )
BMC Med Inform Decis Mak - An evidential reasoning based model for diagnosis of lymph node metastasis in gastric cancer. ( 0,628466783130999 )
Med Decis Making - Development of inpatient risk stratification models of acute kidney injury for use in electronic health records. ( 0,627334937319982 )
J Biomed Inform - An empirical approach to model selection through validation for censored survival data. ( 0,627254942909665 )
AMIA Annu Symp Proc - Outlier Detection with One-Class SVMs: An Application to Melanoma Prognosis. ( 0,623711720474563 )
Brief. Bioinformatics - Adjusting confounders in ranking biomarkers: a model-based ROC approach. ( 0,620864648536655 )
AMIA Annu Symp Proc - Learning medical diagnosis models from multiple experts. ( 0,617885000659458 )
Artif Intell Med - Predicting the need for CT imaging in children with minor head injury using an ensemble of Naive Bayes classifiers. ( 0,615739259096533 )
Int J Health Geogr - Assessing the effects of variables and background selection on the capture of the tick climate niche. ( 0,611672788683813 )
Spat Spatiotemporal Epidemiol - Modeling habitat suitability for occurrence of highly pathogenic avian influenza virus H5N1 in domestic poultry in Asia: a spatial multicriteria decision analysis approach. ( 0,60966108285235 )
BMC Med Inform Decis Mak - Diabetic retinopathy risk prediction for fundus examination using sparse learning: a cross-sectional study. ( 0,607966702212419 )
J Chem Inf Model - Homology modeling of human muscarinic acetylcholine receptors. ( 0,607615634184446 )
Comput Biol Chem - Using ensemble methods to deal with imbalanced data in predicting protein-protein interactions. ( 0,605441071826949 )