BMC Med Inform Decis Mak - Predicting disease risks from highly imbalanced data using random forest.

Tópicos

{ model(2341) predict(2261) use(1141) }
{ learn(2355) train(1041) set(1003) }
{ method(2212) result(1239) propos(1039) }
{ featur(3375) classif(2383) classifi(1994) }
{ sequenc(1873) structur(1644) protein(1328) }
{ patient(2315) diseas(1263) diabet(1191) }
{ monitor(1329) mobil(1314) devic(1160) }
{ risk(3053) factor(974) diseas(938) }
{ intervent(3218) particip(2042) group(1664) }
{ use(976) code(926) identifi(902) }
{ ehr(2073) health(1662) electron(1139) }
{ cost(1906) reduc(1198) effect(832) }
{ use(1733) differ(960) four(931) }
{ decis(3086) make(1611) patient(1517) }
{ system(1976) rule(880) can(841) }
{ motion(1329) object(1292) video(1091) }
{ framework(1458) process(801) describ(734) }
{ algorithm(1844) comput(1787) effici(935) }
{ method(984) reconstruct(947) comput(926) }
{ system(1050) medic(1026) inform(1018) }
{ health(1844) social(1437) communiti(874) }
{ measur(2081) correl(1212) valu(896) }
{ problem(2511) optim(1539) algorithm(950) }
{ method(1557) propos(1049) approach(1037) }
{ data(1714) softwar(1251) tool(1186) }
{ control(1307) perform(991) simul(935) }
{ health(3367) inform(1360) care(1135) }
{ state(1844) use(1261) util(961) }
{ patient(2837) hospit(1953) medic(668) }
{ data(2317) use(1299) case(1017) }
{ sampl(1606) size(1419) use(1276) }
{ data(3008) multipl(1320) sourc(1022) }
{ first(2504) two(1366) second(1323) }
{ time(1939) patient(1703) rate(768) }
{ can(981) present(881) function(850) }
{ high(1669) rate(1365) level(1280) }
{ cancer(2502) breast(956) screen(824) }
{ implement(1333) system(1263) develop(1122) }
{ model(3404) distribut(989) bayesian(671) }
{ can(774) often(719) complex(702) }
{ imag(1947) propos(1133) code(1026) }
{ data(1737) use(1416) pattern(1282) }
{ inform(2794) health(2639) internet(1427) }
{ imag(1057) registr(996) error(939) }
{ bind(1733) structur(1185) ligand(1036) }
{ method(1219) similar(1157) match(930) }
{ imag(2830) propos(1344) filter(1198) }
{ network(2748) neural(1063) input(814) }
{ imag(2675) segment(2577) method(1081) }
{ take(945) account(800) differ(722) }
{ studi(2440) review(1878) systemat(933) }
{ assess(1506) score(1403) qualiti(1306) }
{ treatment(1704) effect(941) patient(846) }
{ surgeri(1148) surgic(1085) robot(1054) }
{ error(1145) method(1030) estim(1020) }
{ chang(1828) time(1643) increas(1301) }
{ concept(1167) ontolog(924) domain(897) }
{ clinic(1479) use(1117) guidelin(835) }
{ extract(1171) text(1153) clinic(932) }
{ design(1359) user(1324) use(1319) }
{ model(2220) cell(1177) simul(1124) }
{ care(1570) inform(1187) nurs(1089) }
{ general(901) number(790) one(736) }
{ search(2224) databas(1162) retriev(909) }
{ featur(1941) imag(1645) propos(1176) }
{ case(1353) use(1143) diagnosi(1136) }
{ howev(809) still(633) remain(590) }
{ data(3963) clinic(1234) research(1004) }
{ studi(1410) differ(1259) use(1210) }
{ perform(999) metric(946) measur(919) }
{ research(1085) discuss(1038) issu(1018) }
{ import(1318) role(1303) understand(862) }
{ visual(1396) interact(850) tool(830) }
{ compound(1573) activ(1297) structur(1058) }
{ perform(1367) use(1326) method(1137) }
{ studi(1119) effect(1106) posit(819) }
{ blood(1257) pressur(1144) flow(957) }
{ spatial(1525) area(1432) region(1030) }
{ record(1888) medic(1808) patient(1693) }
{ model(3480) simul(1196) paramet(876) }
{ research(1218) medic(880) student(794) }
{ model(2656) set(1616) predict(1553) }
{ age(1611) year(1155) adult(843) }
{ medic(1828) order(1363) alert(1069) }
{ signal(2180) analysi(812) frequenc(800) }
{ group(2977) signific(1463) compar(1072) }
{ gene(2352) biolog(1181) express(1162) }
{ activ(1138) subject(705) human(624) }
{ patient(1821) servic(1111) care(1106) }
{ use(2086) technolog(871) perceiv(783) }
{ analysi(2126) use(1163) compon(1037) }
{ structur(1116) can(940) graph(676) }
{ drug(1928) target(777) effect(648) }
{ result(1111) use(1088) new(759) }
{ survey(1388) particip(1329) question(1065) }
{ estim(2440) model(1874) function(577) }
{ process(1125) use(805) approach(778) }
{ activ(1452) weight(1219) physic(1104) }
{ method(1969) cluster(1462) data(1082) }
{ detect(2391) sensit(1101) algorithm(908) }

Resumo

CKGROUND: We present a method utilizing Healthcare Cost and Utilization Project (HCUP) dataset for predicting disease risk of individuals based on their medical diagnosis history. The presented methodology may be incorporated in a variety of applications such as risk management, tailored health communication and decision support systems in healthcare.METHODS: We employed the National Inpatient Sample (NIS) data, which is publicly available through Healthcare Cost and Utilization Project (HCUP), to train random forest classifiers for disease prediction. Since the HCUP data is highly imbalanced, we employed an ensemble learning approach based on repeated random sub-sampling. This technique divides the training data into multiple sub-samples, while ensuring that each sub-sample is fully balanced. We compared the performance of support vector machine (SVM), bagging, boosting and RF to predict the risk of eight chronic diseases.RESULTS: We predicted eight disease categories. Overall, the RF ensemble learning method outperformed SVM, bagging and boosting in terms of the area under the receiver operating characteristic (ROC) curve (AUC). In addition, RF has the advantage of computing the importance of each variable in the classification process.CONCLUSIONS: In combining repeated random sub-sampling with RF, we were able to overcome the class imbalance problem and achieve promising results. Using the national HCUP data set, we predicted eight disease categories with an average AUC of 88.79%.

Resumo Limpo

ckground present method util healthcar cost util project hcup dataset predict diseas risk individu base medic diagnosi histori present methodolog may incorpor varieti applic risk manag tailor health communic decis support system healthcaremethod employ nation inpati sampl nis data public avail healthcar cost util project hcup train random forest classifi diseas predict sinc hcup data high imbalanc employ ensembl learn approach base repeat random subsampl techniqu divid train data multipl subsampl ensur subsampl fulli balanc compar perform support vector machin svm bag boost rf predict risk eight chronic diseasesresult predict eight diseas categori overal rf ensembl learn method outperform svm bag boost term area receiv oper characterist roc curv auc addit rf advantag comput import variabl classif processconclus combin repeat random subsampl rf abl overcom class imbal problem achiev promis result use nation hcup data set predict eight diseas categori averag auc

Resumos Similares

Comput Methods Programs Biomed - Prediction of human breast and colon cancers from imbalanced data using nearest neighbor and support vector machines. ( 0,722000213459138 )
J Med Syst - A new approach: role of data mining in prediction of survival of burn patients. ( 0,678897109589521 )
J Am Med Inform Assoc - Applying active learning to supervised word sense disambiguation in MEDLINE. ( 0,677420421860584 )
J Am Med Inform Assoc - Machine learning for predicting the response of breast cancer to neoadjuvant chemotherapy. ( 0,6711183926191 )
Comput Biol Chem - Prediction of protein modification sites of gamma-carboxylation using position specific scoring matrices based evolutionary information. ( 0,666738439672524 )
J Am Med Inform Assoc - Predicting complications of percutaneous coronary intervention using a novel support vector method. ( 0,64562800717567 )
Artif Intell Med - White box radial basis function classifiers with component selection for clinical prediction models. ( 0,639515695790383 )
J Am Med Inform Assoc - From vital signs to clinical outcomes for patients with sepsis: a machine learning basis for a clinical decision support system. ( 0,63842231957872 )
Comput Methods Programs Biomed - Discriminating protein structure classes by incorporating Pseudo Average Chemical Shift to Chou's general PseAAC and Support Vector Machine. ( 0,636874926920642 )
J Am Med Inform Assoc - Supervised embedding of textual predictors with applications in clinical diagnostics for pediatric cardiology. ( 0,633266911308448 )
Comput. Biol. Med. - A learning method for the class imbalance problem with medical data sets. ( 0,632368909738541 )
Comput. Biol. Med. - Signal peptide discrimination and cleavage site identification using SVM and NN. ( 0,627018170814883 )
Comput Math Methods Med - Iterative reweighted noninteger norm regularizing SVM for gene expression data classification. ( 0,625740764934148 )
Int J Med Inform - Prediction of hospitalization due to heart diseases by supervised learning methods. ( 0,624035124621631 )
Methods Inf Med - An experimental evaluation of boosting methods for classification. ( 0,622995633480025 )
Comput Methods Programs Biomed - Supervised hybrid feature selection based on PSO and rough sets for medical diagnosis. ( 0,621794195038607 )
Med Decis Making - The Impact of Oversampling with SMOTE on the Performance of 3 Classifiers in Prediction of Type 2 Diabetes. ( 0,621403736613811 )
J. Comput. Biol. - Prediction of siRNA potency using sparse logistic regression. ( 0,621080593517534 )
Artif Intell Med - Prediction of human major histocompatibility complex class II binding peptides by continuous kernel discrimination method. ( 0,614014161017625 )
Spat Spatiotemporal Epidemiol - Assessment of land use factors associated with dengue cases in Malaysia using Boosted Regression Trees. ( 0,609668280414875 )
J Chem Inf Model - Training based on ligand efficiency improves prediction of bioactivities of ligands and drug target proteins in a machine learning approach. ( 0,607984364043995 )
Comput Biol Chem - newDNA-Prot: Prediction of DNA-binding proteins by employing support vector machine and a comprehensive sequence representation. ( 0,606502739350429 )
J Am Med Inform Assoc - A sequence labeling approach to link medications and their attributes in clinical notes and clinical trial announcements for information extraction. ( 0,605289526835987 )
Comput Methods Programs Biomed - Recurrence predictive models for patients with hepatocellular carcinoma after radiofrequency ablation using support vector machines with feature selection methods. ( 0,605079402606969 )
BMC Med Inform Decis Mak - A three-step approach for the derivation and validation of high-performing predictive models using an operational dataset: congestive heart failure readmission case study. ( 0,603521220610276 )
Comput. Biol. Med. - Prediction of pre-miRNA with multiple stem-loops using pruning algorithm. ( 0,602669360609129 )
Med Decis Making - A comparison of methods for converting DCE values onto the full health-dead QALY scale. ( 0,598197359797919 )
J Biomed Inform - An empirical approach to model selection through validation for censored survival data. ( 0,597914127757945 )
Comput Methods Programs Biomed - ThyroScreen system: high resolution ultrasound thyroid image characterization into benign and malignant classes using novel combination of texture and discrete wavelet transform. ( 0,597875597512153 )
J Med Syst - Effective automated prediction of vertebral column pathologies based on logistic model tree with SMOTE preprocessing. ( 0,597626633186064 )
Comput Methods Programs Biomed - Machine learning algorithms and forced oscillation measurements to categorise the airway obstruction severity in chronic obstructive pulmonary disease. ( 0,597241958040248 )
Comput Biol Chem - CE-PLoc: an ensemble classifier for predicting protein subcellular locations by fusing different modes of pseudo amino acid composition. ( 0,59038404842204 )
J Med Syst - Diagnosing breast masses in digital mammography using feature selection and ensemble methods. ( 0,58911791697972 )
Artif Intell Med - Machine learning of clinical performance in a pancreatic cancer database. ( 0,589009384545578 )
Comput. Biol. Med. - A novel algorithm combining support vector machine with the discrete wavelet transform for the prediction of protein subcellular localization. ( 0,585150073666742 )
Comput. Biol. Med. - Medical decision support system for diagnosis of neuromuscular disorders using DWT and fuzzy support vector machines. ( 0,583256053101781 )
IEEE J Biomed Health Inform - Classification of color images of dermatological ulcers. ( 0,582241475686103 )
Methods Inf Med - Sensor-based fall risk assessment--an expert 'to go'. ( 0,582020314268745 )
IEEE Trans Image Process - Efficient image classification via multiple rank regression. ( 0,57842644557745 )
Comput Biol Chem - Using ensemble methods to deal with imbalanced data in predicting protein-protein interactions. ( 0,576941768265084 )
BMC Med Inform Decis Mak - Mining geriatric assessment data for in-patient fall prediction models and high-risk subgroups. ( 0,576135395939297 )
AMIA Annu Symp Proc - Outlier Detection with One-Class SVMs: An Application to Melanoma Prognosis. ( 0,576028882113753 )
Comput Math Methods Med - An efficient diagnosis system for Parkinson's disease using kernel-based extreme learning machine with subtractive clustering features weighting approach. ( 0,575012637375245 )
IEEE J Biomed Health Inform - The effect of sample age and prediction resolution on myocardial infarction risk prediction. ( 0,574727423207593 )
J Biomed Inform - Use of Medical Subject Headings (MeSH) in Portuguese for categorizing web-based healthcare content. ( 0,573733215512284 )
Comput Biol Chem - An improved poly(A) motifs recognition method based on decision level fusion. ( 0,572803263727217 )
BMC Med Inform Decis Mak - Application of support vector machine modeling for prediction of common diseases: the case of diabetes and pre-diabetes. ( 0,572610654071946 )
Comput. Biol. Med. - FRAN and RBF-PSO as two components of a hyper framework to recognize protein folds. ( 0,571923871400873 )
J Chem Inf Model - Classifying large chemical data sets: using a regularized potential function method. ( 0,571444123196546 )
Brief. Bioinformatics - Critical assessment of high-throughput standalone methods for secondary structure prediction. ( 0,570298748659794 )
Artif Intell Med - Improving the Mann-Whitney statistical test for feature selection: an approach in breast cancer diagnosis on mammography. ( 0,570169879129468 )
Comput. Biol. Med. - Pre-operative prediction of surgical morbidity in children: comparison of five statistical models. ( 0,566548067972304 )
Comput. Biol. Med. - Combined prediction of transmembrane topology and signal peptide of beta-barrel proteins: using a hidden Markov model and genetic algorithms. ( 0,566079378706447 )
Comput Methods Programs Biomed - Machine learning algorithms and forced oscillation measurements applied to the automatic identification of chronic obstructive pulmonary disease. ( 0,565866374965547 )
IEEE J Biomed Health Inform - Novel fractal feature-based multiclass glaucoma detection and progression prediction. ( 0,564317232967178 )
Neural Comput - Extended robust support vector machine based on financial risk minimization. ( 0,562948297465384 )
Comput. Biol. Med. - A hybrid intelligent system for diagnosing microalbuminuria in type 2 diabetes patients without having to measure urinary albumin. ( 0,561987883250767 )
Comput. Biol. Med. - Identification of voltage-gated potassium channel subfamilies from sequence information using support vector machine. ( 0,558197593982719 )
J Am Med Inform Assoc - A patient-driven adaptive prediction technique to improve personalized risk estimation for clinical decision support. ( 0,557612565156687 )
J Am Med Inform Assoc - An improved model for predicting postoperative nausea and vomiting in ambulatory surgery patients using physician-modifiable risk factors. ( 0,556632718763396 )
J Am Med Inform Assoc - Word sense disambiguation in the clinical domain: a comparison of knowledge-rich and knowledge-poor unsupervised methods. ( 0,556000494222881 )
J Biomed Inform - Decision-making model for early diagnosis of congestive heart failure using rough set and decision tree approaches. ( 0,553009880397722 )
IEEE J Biomed Health Inform - Multiple kernel learning in the primal for multimodal Alzheimer's disease classification. ( 0,551527812996873 )
J Am Med Inform Assoc - A novel method of adverse event detection can accurately identify venous thromboembolisms (VTEs) from narrative electronic health record data. ( 0,550925194526687 )
J Clin Monit Comput - Heart rate variability analysis during central hypovolemia using wavelet transformation. ( 0,550112126961196 )
J Biomed Inform - Statistical process control for validating a classification tree model for predicting mortality--a novel approach towards temporal validation. ( 0,549913471097276 )
BMC Med Inform Decis Mak - Evaluation of prediction models for the staging of prostate cancer. ( 0,546075706452942 )
J Biomed Inform - Classification of CT pulmonary angiography reports by presence, chronicity, and location of pulmonary embolism with natural language processing. ( 0,545966773473219 )
J Am Med Inform Assoc - Automating annotation of information-giving for analysis of clinical conversation. ( 0,545548717428318 )
Methods Inf Med - Classification of postural profiles among mouth-breathing children by learning vector quantization. ( 0,545365222248671 )
AMIA Annu Symp Proc - Learning medical diagnosis models from multiple experts. ( 0,545233456849122 )
Comput. Biol. Med. - A knowledge-driven probabilistic framework for the prediction of protein-protein interaction networks. ( 0,544828261431385 )
J Chem Inf Model - Are bigger data sets better for machine learning? Fusing single-point and dual-event dose response data for Mycobacterium tuberculosis. ( 0,544066873196854 )
BMC Med Inform Decis Mak - An improved survivability prognosis of breast cancer by using sampling and feature selection technique to solve imbalanced patient classification data. ( 0,543132356714457 )
Artif Intell Med - Supervised machine learning-based classification of oral malodor based on the microbiota in saliva samples. ( 0,542902083099937 )
J Biomed Inform - Protein contact map prediction using multi-stage hybrid intelligence inference systems. ( 0,542188802424401 )
J Chem Inf Model - Pragmatic approaches to using computational methods to predict xenobiotic metabolism. ( 0,542036768163184 )
AMIA Annu Symp Proc - Clinical risk prediction by exploring high-order feature correlations. ( 0,540860434112388 )
Comput. Biol. Med. - Remote protein homology detection and fold recognition using two-layer support vector machine classifiers. ( 0,540345695398516 )
Comput Methods Programs Biomed - Design of fuzzy classifier for diabetes disease using Modified Artificial Bee Colony algorithm. ( 0,539122718380634 )
J Am Med Inform Assoc - Computer-aided diagnosis of pneumonia in patients with chronic obstructive pulmonary disease. ( 0,538081162218613 )
J Med Syst - The association forecasting of 13 variants within seven asthma susceptibility genes on 3 serum IgE groups in Taiwanese population by integrating of adaptive neuro-fuzzy inference system (ANFIS) and classification analysis methods. ( 0,537538900211894 )
Comput. Biol. Med. - Robust prediction of protein subcellular localization combining PCA and WSVMs. ( 0,535931937796277 )
Neural Comput - An extension of the receiver operating characteristic curve and AUC-optimal classification. ( 0,535322213160415 )
Comput Methods Programs Biomed - Comparative evaluation of support vector machines for computer aided diagnosis of lung cancer in CT based on a multi-dimensional data set. ( 0,535286284953396 )
AMIA Annu Symp Proc - Predicting discharge mortality after acute ischemic stroke using balanced data. ( 0,534458605218969 )
BMC Med Inform Decis Mak - Non-linear dynamical signal characterization for prediction of defibrillation success through machine learning. ( 0,533657566928607 )
AMIA Annu Symp Proc - Improving predictions in imbalanced data using Pairwise Expanded Logistic Regression. ( 0,532687419604824 )
AMIA Annu Symp Proc - Decision path models for patient-specific modeling of patient outcomes. ( 0,530910348518578 )
Comput Math Methods Med - Modified logistic regression models using gene coexpression and clinical features to predict prostate cancer progression. ( 0,530822328260314 )
J Integr Bioinform - Modelling proteolytic enzymes with Support Vector Machines. ( 0,530388687585305 )
Comput Math Methods Med - Correlation kernels for support vector machines classification with applications in cancer data. ( 0,530042615151548 )
Comput Methods Programs Biomed - Single stage and multistage classification models for the prediction of liver fibrosis degree in patients with chronic hepatitis C infection. ( 0,52917154904522 )
Artif Intell Med - Predicting the need for CT imaging in children with minor head injury using an ensemble of Naive Bayes classifiers. ( 0,528958730581592 )
Int J Med Inform - Application of data mining to the identification of critical factors in patient falls using a web-based reporting system. ( 0,528845356650135 )
Comput Methods Programs Biomed - Prediction of postprandial blood glucose under uncertainty and intra-patient variability in type 1 diabetes: a comparative study of three interval models. ( 0,527891174028966 )
Comput Methods Programs Biomed - Computer-aided diagnosis of breast masses using quantified BI-RADS findings. ( 0,527594021155239 )
BMC Med Inform Decis Mak - Artificial neural network models for prediction of cardiovascular autonomic dysfunction in general Chinese population. ( 0,527090202533591 )
J Med Syst - An integrated index for the identification of diabetic retinopathy stages using texture parameters. ( 0,526539712537733 )
Med Biol Eng Comput - Mortality prediction of rats in acute hemorrhagic shock using machine learning techniques. ( 0,526539712537733 )