Brief. Bioinformatics - Class-imbalanced classifiers for high-dimensional data.

Tópicos

{ featur(3375) classif(2383) classifi(1994) }
{ case(1353) use(1143) diagnosi(1136) }
{ learn(2355) train(1041) set(1003) }
{ implement(1333) system(1263) develop(1122) }
{ sampl(1606) size(1419) use(1276) }
{ data(1737) use(1416) pattern(1282) }
{ method(1219) similar(1157) match(930) }
{ perform(999) metric(946) measur(919) }
{ studi(2440) review(1878) systemat(933) }
{ error(1145) method(1030) estim(1020) }
{ model(2341) predict(2261) use(1141) }
{ perform(1367) use(1326) method(1137) }
{ age(1611) year(1155) adult(843) }
{ use(1733) differ(960) four(931) }
{ take(945) account(800) differ(722) }
{ howev(809) still(633) remain(590) }
{ research(1085) discuss(1038) issu(1018) }
{ model(3480) simul(1196) paramet(876) }
{ signal(2180) analysi(812) frequenc(800) }
{ gene(2352) biolog(1181) express(1162) }
{ can(981) present(881) function(850) }
{ analysi(2126) use(1163) compon(1037) }
{ result(1111) use(1088) new(759) }
{ model(3404) distribut(989) bayesian(671) }
{ system(1976) rule(880) can(841) }
{ measur(2081) correl(1212) valu(896) }
{ motion(1329) object(1292) video(1091) }
{ problem(2511) optim(1539) algorithm(950) }
{ algorithm(1844) comput(1787) effici(935) }
{ method(1557) propos(1049) approach(1037) }
{ data(1714) softwar(1251) tool(1186) }
{ model(2220) cell(1177) simul(1124) }
{ visual(1396) interact(850) tool(830) }
{ studi(1119) effect(1106) posit(819) }
{ medic(1828) order(1363) alert(1069) }
{ high(1669) rate(1365) level(1280) }
{ estim(2440) model(1874) function(577) }
{ decis(3086) make(1611) patient(1517) }
{ can(774) often(719) complex(702) }
{ imag(1947) propos(1133) code(1026) }
{ inform(2794) health(2639) internet(1427) }
{ imag(1057) registr(996) error(939) }
{ bind(1733) structur(1185) ligand(1036) }
{ sequenc(1873) structur(1644) protein(1328) }
{ imag(2830) propos(1344) filter(1198) }
{ network(2748) neural(1063) input(814) }
{ imag(2675) segment(2577) method(1081) }
{ patient(2315) diseas(1263) diabet(1191) }
{ assess(1506) score(1403) qualiti(1306) }
{ treatment(1704) effect(941) patient(846) }
{ surgeri(1148) surgic(1085) robot(1054) }
{ framework(1458) process(801) describ(734) }
{ chang(1828) time(1643) increas(1301) }
{ concept(1167) ontolog(924) domain(897) }
{ clinic(1479) use(1117) guidelin(835) }
{ extract(1171) text(1153) clinic(932) }
{ design(1359) user(1324) use(1319) }
{ control(1307) perform(991) simul(935) }
{ care(1570) inform(1187) nurs(1089) }
{ general(901) number(790) one(736) }
{ method(984) reconstruct(947) comput(926) }
{ search(2224) databas(1162) retriev(909) }
{ featur(1941) imag(1645) propos(1176) }
{ data(3963) clinic(1234) research(1004) }
{ studi(1410) differ(1259) use(1210) }
{ risk(3053) factor(974) diseas(938) }
{ system(1050) medic(1026) inform(1018) }
{ import(1318) role(1303) understand(862) }
{ compound(1573) activ(1297) structur(1058) }
{ blood(1257) pressur(1144) flow(957) }
{ spatial(1525) area(1432) region(1030) }
{ record(1888) medic(1808) patient(1693) }
{ health(3367) inform(1360) care(1135) }
{ monitor(1329) mobil(1314) devic(1160) }
{ ehr(2073) health(1662) electron(1139) }
{ state(1844) use(1261) util(961) }
{ research(1218) medic(880) student(794) }
{ patient(2837) hospit(1953) medic(668) }
{ model(2656) set(1616) predict(1553) }
{ data(2317) use(1299) case(1017) }
{ cost(1906) reduc(1198) effect(832) }
{ group(2977) signific(1463) compar(1072) }
{ data(3008) multipl(1320) sourc(1022) }
{ first(2504) two(1366) second(1323) }
{ intervent(3218) particip(2042) group(1664) }
{ activ(1138) subject(705) human(624) }
{ time(1939) patient(1703) rate(768) }
{ patient(1821) servic(1111) care(1106) }
{ use(2086) technolog(871) perceiv(783) }
{ health(1844) social(1437) communiti(874) }
{ structur(1116) can(940) graph(676) }
{ cancer(2502) breast(956) screen(824) }
{ use(976) code(926) identifi(902) }
{ drug(1928) target(777) effect(648) }
{ survey(1388) particip(1329) question(1065) }
{ process(1125) use(805) approach(778) }
{ activ(1452) weight(1219) physic(1104) }
{ method(1969) cluster(1462) data(1082) }
{ method(2212) result(1239) propos(1039) }
{ detect(2391) sensit(1101) algorithm(908) }

Resumo

A class-imbalanced classifier is a decision rule to predict the class membership of new samples from an available data set where the class sizes differ considerably. When the class sizes are very different, most standard classification algorithms may favor the larger (majority) class resulting in poor accuracy in the minority class prediction. A class-imbalanced classifier typically modifies a standard classifier by a correction strategy or by incorporating a new strategy in the training phase to account for differential class sizes. This article reviews and evaluates some most important methods for class prediction of high-dimensional imbalanced data. The evaluation addresses the fundamental issues of the class-imbalanced classification problem: imbalance ratio, small disjuncts and overlap complexity, lack of data and feature selection. Four class-imbalanced classifiers are considered. The four classifiers include three standard classification algorithms each coupled with an ensemble correction strategy and one support vector machines (SVM)-based correction classifier. The three algorithms are (i) diagonal linear discriminant analysis (DLDA), (ii) random forests (RFs) and (ii) SVMs. The SVM-based correction classifier is SVM threshold adjustment (SVM-THR). A Monte-Carlo simulation and five genomic data sets were used to illustrate the analysis and address the issues. The SVM-ensemble classifier appears to perform the best when the class imbalance is not too severe. The SVM-THR performs well if the imbalance is severe and predictors are highly correlated. The DLDA with a feature selection can perform well without using the ensemble correction.

Resumo Limpo

classimbalanc classifi decis rule predict class membership new sampl avail data set class size differ consider class size differ standard classif algorithm may favor larger major class result poor accuraci minor class predict classimbalanc classifi typic modifi standard classifi correct strategi incorpor new strategi train phase account differenti class size articl review evalu import method class predict highdimension imbalanc data evalu address fundament issu classimbalanc classif problem imbal ratio small disjunct overlap complex lack data featur select four classimbalanc classifi consid four classifi includ three standard classif algorithm coupl ensembl correct strategi one support vector machin svmbase correct classifi three algorithm diagon linear discrimin analysi dlda ii random forest rfs ii svms svmbase correct classifi svm threshold adjust svmthr montecarlo simul five genom data set use illustr analysi address issu svmensembl classifi appear perform best class imbal sever svmthr perform well imbal sever predictor high correl dlda featur select can perform well without use ensembl correct

Resumos Similares

Artif Intell Med - Texture feature ranking with relevance learning to classify interstitial lung disease patterns. ( 0,85738998412596 )
Comput Biol Chem - Information-theoretic approaches to SVM feature selection for metagenome read classification. ( 0,84549614138846 )
IEEE Trans Image Process - A novel technique for subpixel image classification based on support vector machine. ( 0,827305346870926 )
Comput Biol Chem - A novel divide-and-merge classification for high dimensional datasets. ( 0,824015134831017 )
Comput Math Methods Med - Comparison of different EHG feature selection methods for the detection of preterm labor. ( 0,816624188427283 )
J Med Syst - A new expert system for diagnosis of lung cancer: GDA-LS_SVM. ( 0,812445515304894 )
J Med Syst - A three-stage expert system based on support vector machines for thyroid disease diagnosis. ( 0,806918689399342 )
J Chem Inf Model - Classifying molecules using a sparse probabilistic kernel binary classifier. ( 0,799170750388199 )
Comput. Biol. Med. - Pairwise FCM based feature weighting for improved classification of vertebral column disorders. ( 0,798745848753696 )
J Med Syst - A robust multi-class feature selection strategy based on Rotation Forest Ensemble algorithm for diagnosis of Erythemato-Squamous diseases. ( 0,795828403097034 )
Comput. Biol. Med. - An ensemble system for automatic sleep stage classification using single channel EEG signal. ( 0,79482329159681 )
J Med Syst - SVM feature selection based rotation forest ensemble classifiers to improve computer-aided diagnosis of Parkinson disease. ( 0,792377442022333 )
Comput Biol Chem - Compact cancer biomarkers discovery using a swarm intelligence feature selection algorithm. ( 0,790757747751621 )
Int J Comput Assist Radiol Surg - Building an ensemble system for diagnosing masses in mammograms. ( 0,788008012371997 )
Artif Intell Med - An intelligent classifier for prognosis of cardiac resynchronization therapy based on speckle-tracking echocardiograms. ( 0,785854930789219 )
Int J Neural Syst - Improved adaptive splitting and selection: the hybrid training method of a classifier based on a feature space partitioning. ( 0,784580074489323 )
J Biomed Inform - Automatic figure classification in bioscience literature. ( 0,78363839623681 )
IEEE J Biomed Health Inform - Automatic detection of atrial fibrillation in cardiac vibration signals. ( 0,78340792324463 )
Comput. Biol. Med. - Fast and efficient lung disease classification using hierarchical one-against-all support vector machine and cost-sensitive feature selection. ( 0,779692186242313 )
J Integr Bioinform - On the parameter optimization of Support Vector Machines for binary classification. ( 0,77705083607029 )
Comput. Biol. Med. - Classification of EMG signals using PSO optimized SVM for diagnosis of neuromuscular disorders. ( 0,774487984379608 )
Comput Biol Chem - Derivation of an artificial gene to improve classification accuracy upon gene selection. ( 0,772509958553161 )
Med Biol Eng Comput - Wavelet-based sparse functional linear model with applications to EEGs seizure detection and epilepsy diagnosis. ( 0,772085574993007 )
IEEE J Biomed Health Inform - Support vector machine classification based on correlation prototypes applied to bone age assessment. ( 0,772051985377853 )
Comput. Biol. Med. - A novel class dependent feature selection method for cancer biomarker discovery. ( 0,770936722681437 )
J Med Syst - Symptomatic vs. asymptomatic plaque classification in carotid ultrasound. ( 0,770729624205777 )
Comput Methods Programs Biomed - Hepatitis disease diagnosis using a novel hybrid method based on support vector machine and simulated annealing (SVM-SA). ( 0,770696392863917 )
Comput Biol Chem - CE-PLoc: an ensemble classifier for predicting protein subcellular locations by fusing different modes of pseudo amino acid composition. ( 0,770031140847412 )
Comput Methods Programs Biomed - A random forest classifier for lymph diseases. ( 0,769592117993734 )
Comput Math Methods Med - SVM versus MAP on accelerometer data to distinguish among locomotor activities executed at different speeds. ( 0,768661269098012 )
Comput Methods Programs Biomed - Supervised hybrid feature selection based on PSO and rough sets for medical diagnosis. ( 0,766665134398287 )
Artif Intell Med - Computer-aided diagnosis of pulmonary nodules using a two-step approach for feature selection and classifier ensemble construction. ( 0,765459940579233 )
Comput Methods Programs Biomed - Automatic cervical cell segmentation and classification in Pap smears. ( 0,765285114345733 )
Comput. Biol. Med. - Heartbeat classification using disease-specific feature selection. ( 0,765187077834547 )
J Med Syst - An intelligent system for lung cancer diagnosis using a new genetic algorithm based feature selection method. ( 0,764865888208287 )
J Am Med Inform Assoc - Influenza detection from emergency department reports using natural language processing and Bayesian network classifiers. ( 0,764539931202617 )
Comput Methods Programs Biomed - Complex extreme learning machine applications in terahertz pulsed signals feature sets. ( 0,763803877191101 )
J Biomed Inform - A fast gene selection method for multi-cancer classification using multiple support vector data description. ( 0,760320971748105 )
Int J Comput Assist Radiol Surg - Multimodality GPU-based computer-assisted diagnosis of breast cancer using ultrasound and digital mammography images. ( 0,760201577051176 )
J Med Syst - A comparative study on classification of sleep stage based on EEG signals using feature selection and classification algorithms. ( 0,75962330223779 )
Int J Neural Syst - Assessment of feature selection and classification approaches to enhance information from overnight oximetry in the context of apnea diagnosis. ( 0,757784004504654 )
Artif Intell Med - Classification of small lesions on dynamic breast MRI: Integrating dimension reduction and out-of-sample extension into CADx methodology. ( 0,754696473322176 )
Comput. Biol. Med. - Contourlet-based mammography mass classification using the SVM family. ( 0,754368464647473 )
J Am Med Inform Assoc - A comparative analysis of methods for predicting clinical outcomes using high-dimensional genomic datasets. ( 0,754312297236956 )
Int J Neural Syst - Single-trial motor imagery classification using asymmetry ratio, phase relation, wavelet-based fractal, and their selected combination. ( 0,752649861840205 )
Comput Methods Programs Biomed - An improved method of early diagnosis of smoking-induced respiratory changes using machine learning algorithms. ( 0,752428563644183 )
Comput Methods Programs Biomed - An associative memory approach to medical decision support systems. ( 0,75043506672489 )
IEEE Trans Image Process - Walsh-Hadamard transform kernel-based feature vector for shot boundary detection. ( 0,748075194018048 )
Comput Math Methods Med - Discrimination between Alzheimer's disease and mild cognitive impairment using SOM and PSO-SVM. ( 0,746617695370083 )
IEEE J Biomed Health Inform - Recognizing common CT imaging signs of lung diseases through a new feature selection method based on Fisher criterion and genetic optimization. ( 0,746283133966879 )
J Med Syst - Diagnosis of diabetes diseases using an Artificial Immune Recognition System2 (AIRS2) with fuzzy K-nearest neighbor. ( 0,745627772269217 )
Comput Methods Programs Biomed - A new hybrid intelligent system for accurate detection of Parkinson's disease. ( 0,741529436875767 )
Comput Math Methods Med - Feature selection in classification of eye movements using electrooculography for activity recognition. ( 0,740751608247475 )
Artif Intell Med - Improving the accuracy of suicide attempter classification. ( 0,740257378928486 )
AMIA Annu Symp Proc - Improving predictions in imbalanced data using Pairwise Expanded Logistic Regression. ( 0,740004364711949 )
Comput Math Methods Med - Mixed-norm regularization for brain decoding. ( 0,738361275141698 )
Artif Intell Med - Instance-based classifiers applied to medical databases: diagnosis and knowledge extraction. ( 0,738352571608176 )
J Med Syst - Enhanced cancer recognition system based on random forests feature elimination algorithm. ( 0,737010856278607 )
Comput. Biol. Med. - Decision forest for classification of gene expression data. ( 0,734975946951085 )
Artif Intell Med - Selective voting in convex-hull ensembles improves classification accuracy. ( 0,733305662260377 )
Comput. Biol. Med. - A hybrid feature selection method for DNA microarray data. ( 0,732254417270294 )
Artif Intell Med - Classification of infectious diseases based on chemiluminescent signatures of phagocytes in whole blood. ( 0,731845405242305 )
Comput. Biol. Med. - Classification of diffusion tensor images for the early detection of Alzheimer's disease. ( 0,731422095276882 )
Int J Neural Syst - Extraction of neural control commands using myoelectric pattern recognition: a novel application in adults with cerebral palsy. ( 0,729287648165566 )
J Am Med Inform Assoc - N-gram support vector machines for scalable procedure and diagnosis classification, with applications to clinical free text data from the intensive care unit. ( 0,728842479038025 )
Comput. Biol. Med. - Ant colony optimization-based feature selection method for surface electromyography signals classification. ( 0,727521047296293 )
Med Biol Eng Comput - Pathological speech signal analysis and classification using empirical mode decomposition. ( 0,724688868199132 )
Comput. Biol. Med. - Classification of Error-Related Negativity (ERN) and Positivity (Pe) potentials using kNN and Support Vector Machines. ( 0,724585376679481 )
J Chem Inf Model - Choosing feature selection and learning algorithms in QSAR. ( 0,723073388465535 )
Artif Intell Med - Electrocardiogram analysis using a combination of statistical, geometric, and nonlinear heart rate variability features. ( 0,723039552618156 )
J Med Syst - Luminance sticker based facial expression recognition using discrete wavelet transform for physically disabled persons. ( 0,722831232932952 )
IEEE Trans Image Process - Efficient HIK SVM learning for image classification. ( 0,720808740044312 )
Comput Methods Programs Biomed - Understanding symptomatology of atherosclerotic plaque by image-based tissue characterization. ( 0,719356468996134 )
J Biomed Inform - An efficient statistical feature selection approach for classification of gene expression data. ( 0,718973133618274 )
Artif Intell Med - Selection of effective features for ECG beat recognition based on nonlinear correlations. ( 0,718541180885325 )
Brief. Bioinformatics - Ensemble learning algorithms for classification of mtDNA into haplogroups. ( 0,714899288772311 )
Comput Methods Programs Biomed - Functional activity maps based on significance measures and Independent Component Analysis. ( 0,714603741383622 )
Int J Comput Assist Radiol Surg - Disc herniation diagnosis in MRI using a CAD framework and a two-level classifier. ( 0,714296543714677 )
Comput Methods Programs Biomed - Performance comparison of machine learning methods for prognosis of hormone receptor status in breast cancer tissue samples. ( 0,714047151063585 )
J Med Syst - Classification of normal and diseased liver shapes based on Spherical Harmonics coefficients. ( 0,713417126951077 )
Neural Comput - An Infomax algorithm can perform both familiarity discrimination and feature extraction in a single network. ( 0,713177206212701 )
Artif Intell Med - Subpopulation-specific confidence designation for more informative biomedical classification. ( 0,710625760113296 )
Comput. Biol. Med. - Computer-aided diagnosis system for the Acute Respiratory Distress Syndrome from chest radiographs. ( 0,710391483818603 )
J Med Syst - Classification of speech dysfluencies using LPC based parameterization techniques. ( 0,708910386912552 )
Neural Comput - The support feature machine: classification with the least number of features and application to neuroimaging data. ( 0,706006132987289 )
J Med Syst - Statistical analysis of textural features for improved classification of oral histopathological images. ( 0,705680024205918 )
J Med Syst - Detection of carotid artery disease by using Learning Vector Quantization Neural Network. ( 0,705041271699683 )
J Med Syst - An integrated index for the identification of diabetic retinopathy stages using texture parameters. ( 0,702535751222357 )
Comput. Biol. Med. - SVM-based feature selection to optimize sensitivity-specificity balance applied to weaning. ( 0,701455179673788 )
J Med Syst - Automated screening of arrhythmia using wavelet based machine learning techniques. ( 0,69950892246734 )
Comput Math Methods Med - Comparison of two methods forecasting binding rate of plasma protein. ( 0,697865479629751 )
J Am Med Inform Assoc - Learning regular expressions for clinical text classification. ( 0,697724710301611 )
J Chem Inf Model - Large-scale learning of structure-activity relationships using a linear support vector machine and problem-specific metrics. ( 0,697483181497908 )
Artif Intell Med - Supervised machine learning-based classification of oral malodor based on the microbiota in saliva samples. ( 0,695858083482463 )
BMC Med Inform Decis Mak - Application of support vector machine modeling for prediction of common diseases: the case of diabetes and pre-diabetes. ( 0,695554410535355 )
J Chem Inf Model - A binary ant colony optimization classifier for molecular activities. ( 0,694686531866391 )
Comput Math Methods Med - Determination of fetal state from cardiotocogram using LS-SVM with particle swarm optimization and binary decision tree. ( 0,693417696959043 )
Comput. Biol. Med. - A new feature extraction framework based on wavelets for breast cancer diagnosis. ( 0,693394363107986 )
Int J Neural Syst - Combination of heterogeneous EEG feature extraction methods and stacked sequential learning for sleep stage classification. ( 0,692779012610672 )
J Biomed Inform - Stable feature selection for clinical prediction: exploiting ICD tree structure using Tree-Lasso. ( 0,692603202444232 )