Med Decis Making - The Impact of Oversampling with SMOTE on the Performance of 3 Classifiers in Prediction of Type 2 Diabetes.

Tópicos

{ learn(2355) train(1041) set(1003) }
{ model(2341) predict(2261) use(1141) }
{ risk(3053) factor(974) diseas(938) }
{ perform(999) metric(946) measur(919) }
{ age(1611) year(1155) adult(843) }
{ imag(1057) registr(996) error(939) }
{ case(1353) use(1143) diagnosi(1136) }
{ perform(1367) use(1326) method(1137) }
{ imag(1947) propos(1133) code(1026) }
{ imag(2830) propos(1344) filter(1198) }
{ network(2748) neural(1063) input(814) }
{ model(2656) set(1616) predict(1553) }
{ can(981) present(881) function(850) }
{ detect(2391) sensit(1101) algorithm(908) }
{ model(3404) distribut(989) bayesian(671) }
{ measur(2081) correl(1212) valu(896) }
{ patient(2315) diseas(1263) diabet(1191) }
{ studi(2440) review(1878) systemat(933) }
{ problem(2511) optim(1539) algorithm(950) }
{ error(1145) method(1030) estim(1020) }
{ chang(1828) time(1643) increas(1301) }
{ clinic(1479) use(1117) guidelin(835) }
{ model(2220) cell(1177) simul(1124) }
{ compound(1573) activ(1297) structur(1058) }
{ record(1888) medic(1808) patient(1693) }
{ medic(1828) order(1363) alert(1069) }
{ cost(1906) reduc(1198) effect(832) }
{ sampl(1606) size(1419) use(1276) }
{ intervent(3218) particip(2042) group(1664) }
{ time(1939) patient(1703) rate(768) }
{ patient(1821) servic(1111) care(1106) }
{ use(2086) technolog(871) perceiv(783) }
{ cancer(2502) breast(956) screen(824) }
{ result(1111) use(1088) new(759) }
{ decis(3086) make(1611) patient(1517) }
{ method(2212) result(1239) propos(1039) }
{ can(774) often(719) complex(702) }
{ data(1737) use(1416) pattern(1282) }
{ inform(2794) health(2639) internet(1427) }
{ system(1976) rule(880) can(841) }
{ bind(1733) structur(1185) ligand(1036) }
{ sequenc(1873) structur(1644) protein(1328) }
{ method(1219) similar(1157) match(930) }
{ featur(3375) classif(2383) classifi(1994) }
{ imag(2675) segment(2577) method(1081) }
{ take(945) account(800) differ(722) }
{ motion(1329) object(1292) video(1091) }
{ assess(1506) score(1403) qualiti(1306) }
{ treatment(1704) effect(941) patient(846) }
{ surgeri(1148) surgic(1085) robot(1054) }
{ framework(1458) process(801) describ(734) }
{ concept(1167) ontolog(924) domain(897) }
{ algorithm(1844) comput(1787) effici(935) }
{ extract(1171) text(1153) clinic(932) }
{ method(1557) propos(1049) approach(1037) }
{ data(1714) softwar(1251) tool(1186) }
{ design(1359) user(1324) use(1319) }
{ control(1307) perform(991) simul(935) }
{ care(1570) inform(1187) nurs(1089) }
{ general(901) number(790) one(736) }
{ method(984) reconstruct(947) comput(926) }
{ search(2224) databas(1162) retriev(909) }
{ featur(1941) imag(1645) propos(1176) }
{ howev(809) still(633) remain(590) }
{ data(3963) clinic(1234) research(1004) }
{ studi(1410) differ(1259) use(1210) }
{ research(1085) discuss(1038) issu(1018) }
{ system(1050) medic(1026) inform(1018) }
{ import(1318) role(1303) understand(862) }
{ visual(1396) interact(850) tool(830) }
{ studi(1119) effect(1106) posit(819) }
{ blood(1257) pressur(1144) flow(957) }
{ spatial(1525) area(1432) region(1030) }
{ health(3367) inform(1360) care(1135) }
{ model(3480) simul(1196) paramet(876) }
{ monitor(1329) mobil(1314) devic(1160) }
{ ehr(2073) health(1662) electron(1139) }
{ state(1844) use(1261) util(961) }
{ research(1218) medic(880) student(794) }
{ patient(2837) hospit(1953) medic(668) }
{ data(2317) use(1299) case(1017) }
{ signal(2180) analysi(812) frequenc(800) }
{ group(2977) signific(1463) compar(1072) }
{ gene(2352) biolog(1181) express(1162) }
{ data(3008) multipl(1320) sourc(1022) }
{ first(2504) two(1366) second(1323) }
{ activ(1138) subject(705) human(624) }
{ analysi(2126) use(1163) compon(1037) }
{ health(1844) social(1437) communiti(874) }
{ structur(1116) can(940) graph(676) }
{ high(1669) rate(1365) level(1280) }
{ use(976) code(926) identifi(902) }
{ use(1733) differ(960) four(931) }
{ drug(1928) target(777) effect(648) }
{ implement(1333) system(1263) develop(1122) }
{ survey(1388) particip(1329) question(1065) }
{ estim(2440) model(1874) function(577) }
{ process(1125) use(805) approach(778) }
{ activ(1452) weight(1219) physic(1104) }
{ method(1969) cluster(1462) data(1082) }

Resumo

JECTIVE: To evaluate the impact of the synthetic minority oversampling technique (SMOTE) on the performance of probabilistic neural network (PNN), na?ve Bayes (NB), and decision tree (DT) classifiers for predicting diabetes in a prospective cohort of the Tehran Lipid and Glucose Study (TLGS).METHODS: . Data of the 6647 nondiabetic participants, aged 20 years or older with more than 10 years of follow-up, were used to develop prediction models based on 21 common risk factors. The minority class in the training dataset was oversampled using the SMOTE technique, at 100%, 200%, 300%, 400%, 500%, 600%, and 700% of its original size. The original and the oversampled training datasets were used to establish the classification models. Accuracy, sensitivity, specificity, precision, F-measure, and Youden's index were used to evaluated the performance of classifiers in the test dataset. To compare the performance of the 3 classification models, we used the ROC convex hull (ROCCH).RESULTS: Oversampling the minority class at 700% (completely balanced) increased the sensitivity of the PNN, DT, and NB by 64%, 51%, and 5%, respectively, but decreased the accuracy and specificity of the 3 classification methods. NB had the best Youden's index before and after oversampling. The ROCCH showed that PNN is suboptimal for any class and cost conditions.CONCLUSIONS: To determine a classifier with a machine learning algorithm like the PNN and DT, class skew in data should be considered. The NB and DT were optimal classifiers in a prediction task in an imbalanced medical database.

Resumo Limpo

jectiv evalu impact synthet minor oversampl techniqu smote perform probabilist neural network pnn nave bay nb decis tree dt classifi predict diabet prospect cohort tehran lipid glucos studi tlgsmethod data nondiabet particip age year older year followup use develop predict model base common risk factor minor class train dataset oversampl use smote techniqu origin size origin oversampl train dataset use establish classif model accuraci sensit specif precis fmeasur youden index use evalu perform classifi test dataset compar perform classif model use roc convex hull rocchresult oversampl minor class complet balanc increas sensit pnn dt nb respect decreas accuraci specif classif method nb best youden index oversampl rocch show pnn suboptim class cost conditionsconclus determin classifi machin learn algorithm like pnn dt class skew data consid nb dt optim classifi predict task imbalanc medic databas

Resumos Similares

J Am Med Inform Assoc - Supervised embedding of textual predictors with applications in clinical diagnostics for pediatric cardiology. ( 0,868506061385812 )
AMIA Annu Symp Proc - Outlier Detection with One-Class SVMs: An Application to Melanoma Prognosis. ( 0,821733526481815 )
J Chem Inf Model - Training based on ligand efficiency improves prediction of bioactivities of ligands and drug target proteins in a machine learning approach. ( 0,800277386932 )
Artif Intell Med - Prediction of human major histocompatibility complex class II binding peptides by continuous kernel discrimination method. ( 0,797248386445529 )
AMIA Annu Symp Proc - Learning medical diagnosis models from multiple experts. ( 0,78456881389201 )
Artif Intell Med - Machine learning of clinical performance in a pancreatic cancer database. ( 0,760263729506571 )
J Am Med Inform Assoc - Learning classification models with soft-label information. ( 0,749158916959884 )
IEEE Trans Image Process - Unsupervised amplitude and texture classification of SAR images with multinomial latent model. ( 0,745194724275024 )
J Biomed Inform - Applying active learning to assertion classification of concepts in clinical text. ( 0,724304358170979 )
IEEE Trans Pattern Anal Mach Intell - Distance-Based Image Classification: Generalizing to New Classes at Near Zero Cost. ( 0,711927717261711 )
Comput Math Methods Med - On multilabel classification methods of incompletely labeled biomedical text data. ( 0,707376321863488 )
J Chem Inf Model - Are bigger data sets better for machine learning? Fusing single-point and dual-event dose response data for Mycobacterium tuberculosis. ( 0,703957303761482 )
IEEE Trans Neural Netw Learn Syst - A Kernel Classification Framework for Metric Learning. ( 0,700593438623507 )
Comput Math Methods Med - Correlation kernels for support vector machines classification with applications in cancer data. ( 0,698376556792591 )
IEEE Trans Pattern Anal Mach Intell - Weakly Supervised Recognition of Daily Life Activities with Wearable Sensors. ( 0,69432606340165 )
J. Comput. Biol. - Imbalanced class learning in epigenetics. ( 0,692515203048151 )
J Chem Inf Model - Pragmatic approaches to using computational methods to predict xenobiotic metabolism. ( 0,6900284153507 )
J Biomed Inform - Semi-supervised clinical text classification with Laplacian SVMs: an application to cancer case management. ( 0,688063353880782 )
Int J Neural Syst - Span: spike pattern association neuron for learning spatio-temporal spike patterns. ( 0,688061252556021 )
Int J Med Inform - Where should electronic records for patients be stored? ( 0,686091279836636 )
Neural Comput - Extended robust support vector machine based on financial risk minimization. ( 0,683851843911949 )
Int J Neural Syst - Aggregation of sparse linear discriminant analyses for event-related potential classification in brain-computer interface. ( 0,680058120235656 )
J Med Syst - 3D similarity-dissimilarity plot for high dimensional data visualization in the context of biomedical pattern classification. ( 0,679845652164169 )
Comput Methods Programs Biomed - Machine learning algorithms and forced oscillation measurements applied to the automatic identification of chronic obstructive pulmonary disease. ( 0,678503579557482 )
Comput. Biol. Med. - Robust prediction of protein subcellular localization combining PCA and WSVMs. ( 0,676770590896189 )
Neural Comput - Adaptive metric learning vector quantization for ordinal classification. ( 0,671444009265219 )
J Am Med Inform Assoc - Active learning for clinical text classification: is it better than random sampling? ( 0,67043011483555 )
Int J Neural Syst - Structurally enhanced incremental neural learning for image classification with subgraph extraction. ( 0,669842301178968 )
IEEE Trans Image Process - Manifold regularized multitask learning for semi-supervised multilabel image classification. ( 0,667360239959089 )
IEEE J Biomed Health Inform - The effect of sample age and prediction resolution on myocardial infarction risk prediction. ( 0,667287961918791 )
J Am Med Inform Assoc - Machine learning for predicting the response of breast cancer to neoadjuvant chemotherapy. ( 0,666540567407789 )
Comput. Biol. Med. - A learning method for the class imbalance problem with medical data sets. ( 0,661352672007695 )
IEEE Trans Image Process - Geodesic propagation for semantic labeling. ( 0,658870121792057 )
Comput Methods Programs Biomed - Modified CC-LR algorithm with three diverse feature sets for motor imagery tasks classification in EEG based brain-computer interface. ( 0,658864747141673 )
IEEE Trans Neural Netw Learn Syst - Hyperparameter Selection for Gaussian Process One-Class Classification. ( 0,658010797806592 )
IEEE J Biomed Health Inform - Systematic Poisoning Attacks on and Defenses for Machine Learning in Healthcare. ( 0,656174336648144 )
Neural Comput - Computing sparse representations of multidimensional signals using Kronecker bases. ( 0,654961448448174 )
IEEE Trans Image Process - Multiview Hessian regularization for image annotation. ( 0,651150266604326 )
J Biomed Inform - Class proximity measures--dissimilarity-based classification and display of high-dimensional data. ( 0,649361129292384 )
Int J Neural Syst - Online semi-supervised growing neural gas. ( 0,647604266754921 )
IEEE Trans Image Process - Task-specific image partitioning. ( 0,647088488343016 )
IEEE Trans Image Process - Saliency and gist features for target detection in satellite images. ( 0,646788947328706 )
J Med Syst - Effective automated prediction of vertebral column pathologies based on logistic model tree with SMOTE preprocessing. ( 0,646153906264886 )
Neural Comput - Blocked 3?2 cross-validated t-test for comparing supervised classification learning algorithms. ( 0,645461488453207 )
IEEE Trans Pattern Anal Mach Intell - Representation Learning: A Review and New Perspectives. ( 0,644711114865459 )
IEEE Trans Neural Netw Learn Syst - Adaptive Batch Mode Active Learning. ( 0,644694589788509 )
J Biomed Inform - Multi-label classification of chronically ill patients with bag of words and supervised dimensionality reduction algorithms. ( 0,644507082265237 )
IEEE Trans Image Process - Active learning for solving the incomplete data problem in facial age classification by the furthest nearest-neighbor criterion. ( 0,64271260748824 )
IEEE Trans Image Process - Hyperspectral image classification through bilayer graph-based learning. ( 0,640224633057536 )
Neural Comput - Reduction from cost-sensitive ordinal ranking to weighted binary classification. ( 0,638729737853086 )
IEEE Trans Image Process - Image annotation by input-output structural grouping sparsity. ( 0,637290799756227 )
IEEE Trans Image Process - Joint segmentation of images and scanned point cloud in large-scale street scenes with low-annotation cost. ( 0,637064299738841 )
IEEE Trans Image Process - A linear support higher-order tensor machine for classification. ( 0,636495972553512 )
IEEE Trans Image Process - Artistic image analysis using graph-based learning approaches. ( 0,635742925892118 )
Neural Comput - Metacognitive learning in a fully complex-valued radial basis function neural network. ( 0,634874746558059 )
IEEE Trans Image Process - Self-supervised online metric learning with low rank constraint for scene categorization. ( 0,634181694133285 )
Artif Intell Med - An evaluation of heuristics for rule ranking. ( 0,633500876229954 )
Neural Comput - Multiple spectral kernel learning and a gaussian complexity computation. ( 0,633241600276865 )
AMIA Annu Symp Proc - Comparison and combination of several MeSH indexing approaches. ( 0,633020515409355 )
J Biomed Inform - Incremental Gaussian Discriminant Analysis based on Graybill and Deal weighted combination of estimators for brain tumour diagnosis. ( 0,632346558698392 )
IEEE Trans Image Process - Improving Web image search by bag-based reranking. ( 0,631976056288313 )
Comput. Biol. Med. - Sparse Manifold Clustering and Embedding to discriminate gene expression profiles of glioblastoma and meningioma tumors. ( 0,631297680579415 )
IEEE Trans Neural Netw Learn Syst - Partially shared latent factor learning with multiview data. ( 0,63128058025484 )
J Med Syst - A new approach: role of data mining in prediction of survival of burn patients. ( 0,631123047225316 )
IEEE J Biomed Health Inform - Supervised hierarchical Bayesian model-based electomyographic control and analysis. ( 0,629711544451994 )
IEEE Trans Pattern Anal Mach Intell - Feature Selection with Conjunctions of Decision Stumps and Learning from Microarray Data. ( 0,626941315285478 )
Neural Comput - Unsupervised learning of generative and discriminative weights encoding elementary image components in a predictive coding model of cortical function. ( 0,626813864873459 )
IEEE Trans Pattern Anal Mach Intell - Label Consistent K-SVD: Learning A Discriminative Dictionary for Recognition. ( 0,625802098035852 )
Artif Intell Med - Improved modeling of clinical data with kernel methods. ( 0,624187439927075 )
Neural Comput - Divergence-based vector quantization. ( 0,622180325967775 )
IEEE Trans Pattern Anal Mach Intell - Learning Categories from Few Examples with Multi Model Knowledge Transfer. ( 0,622103622563978 )
BMC Med Inform Decis Mak - Predicting disease risks from highly imbalanced data using random forest. ( 0,621403736613811 )
J Biomed Inform - Classification of CT pulmonary angiography reports by presence, chronicity, and location of pulmonary embolism with natural language processing. ( 0,62114089289962 )
J Biomed Inform - Learning classification models from multiple experts. ( 0,619506141314305 )
Methods Inf Med - Sensor-based fall risk assessment--an expert 'to go'. ( 0,617788874902471 )
IEEE Trans Image Process - Structured max-margin learning for inter-related classifier training and multilabel image annotation. ( 0,617074821351809 )
Int J Med Inform - Prediction of hospitalization due to heart diseases by supervised learning methods. ( 0,616457622811302 )
IEEE J Biomed Health Inform - Service-oriented medical system for supporting decisions with missing and imbalanced data. ( 0,615234456974737 )
J Biomed Inform - Portable automatic text classification for adverse drug reaction detection via multi-corpus training. ( 0,614321758252202 )
J. Comput. Biol. - Locally learning biomedical data using diffusion frames. ( 0,614013124207768 )
J Am Med Inform Assoc - Predicting complications of percutaneous coronary intervention using a novel support vector method. ( 0,611448753147409 )
IEEE Trans Neural Netw Learn Syst - ML-Tree: a tree-structure-based approach to multilabel learning. ( 0,608395730262741 )
IEEE Trans Pattern Anal Mach Intell - The Effect of Model Misspecification on Semi-Supervised Classification. ( 0,608306147356739 )
Methods Inf Med - Probability machines: consistent probability estimation using nonparametric learning machines. ( 0,605613100524354 )
BMC Med Inform Decis Mak - Sensors vs. experts - a performance comparison of sensor-based fall risk assessment vs. conventional assessment in a sample of geriatric patients. ( 0,603625969315076 )
IEEE Trans Pattern Anal Mach Intell - Facial Age Estimation by Learning from Label Distributions. ( 0,603423119103256 )
J Am Med Inform Assoc - Applying active learning to supervised word sense disambiguation in MEDLINE. ( 0,601835811662591 )
Comput Methods Programs Biomed - Multistage approach for clustering and classification of ECG data. ( 0,601499007104345 )
IEEE Trans Image Process - Subspaces indexing model on Grassmann manifold for image search. ( 0,600173788406718 )
J Chem Inf Model - Atom environment kernels on molecules. ( 0,598291323007739 )
Neural Comput - Online learning with (multiple) kernels: a review. ( 0,597983711378402 )
IEEE Trans Image Process - Design of non-linear kernel dictionaries for object recognition. ( 0,597907518034222 )
J Biomed Inform - Active learning strategies for the deduplication of electronic patient data using classification trees. ( 0,597565472327745 )
Int J Neural Syst - Linear time relational prototype based learning. ( 0,597467726938268 )
Lifetime Data Anal - ROC analysis for multiple markers with tree-based classification. ( 0,595440874105138 )
BMC Med Inform Decis Mak - Mining geriatric assessment data for in-patient fall prediction models and high-risk subgroups. ( 0,594146227654479 )
BMC Med Inform Decis Mak - Artificial neural network models for prediction of cardiovascular autonomic dysfunction in general Chinese population. ( 0,590737086481957 )
IEEE Trans Image Process - Incremental training of a detector using online sparse eigendecomposition. ( 0,588310950734582 )
BMC Med Inform Decis Mak - Learning to improve medical decision making from imbalanced data without a priori cost. ( 0,588107999175149 )
IEEE Trans Image Process - Learning discriminative dictionary for group sparse representation. ( 0,585397052147561 )