Comput Biol Chem - Information-theoretic approaches to SVM feature selection for metagenome read classification.

Tópicos

{ featur(3375) classif(2383) classifi(1994) }
{ sequenc(1873) structur(1644) protein(1328) }
{ learn(2355) train(1041) set(1003) }
{ sampl(1606) size(1419) use(1276) }
{ perform(999) metric(946) measur(919) }
{ perform(1367) use(1326) method(1137) }
{ extract(1171) text(1153) clinic(932) }
{ howev(809) still(633) remain(590) }
{ high(1669) rate(1365) level(1280) }
{ process(1125) use(805) approach(778) }
{ method(1219) similar(1157) match(930) }
{ error(1145) method(1030) estim(1020) }
{ general(901) number(790) one(736) }
{ method(984) reconstruct(947) comput(926) }
{ data(1737) use(1416) pattern(1282) }
{ bind(1733) structur(1185) ligand(1036) }
{ framework(1458) process(801) describ(734) }
{ problem(2511) optim(1539) algorithm(950) }
{ chang(1828) time(1643) increas(1301) }
{ algorithm(1844) comput(1787) effici(935) }
{ method(1557) propos(1049) approach(1037) }
{ featur(1941) imag(1645) propos(1176) }
{ data(3963) clinic(1234) research(1004) }
{ studi(1119) effect(1106) posit(819) }
{ monitor(1329) mobil(1314) devic(1160) }
{ state(1844) use(1261) util(961) }
{ research(1218) medic(880) student(794) }
{ patient(2837) hospit(1953) medic(668) }
{ model(2656) set(1616) predict(1553) }
{ activ(1138) subject(705) human(624) }
{ time(1939) patient(1703) rate(768) }
{ patient(1821) servic(1111) care(1106) }
{ use(2086) technolog(871) perceiv(783) }
{ structur(1116) can(940) graph(676) }
{ implement(1333) system(1263) develop(1122) }
{ method(1969) cluster(1462) data(1082) }
{ model(3404) distribut(989) bayesian(671) }
{ can(774) often(719) complex(702) }
{ imag(1947) propos(1133) code(1026) }
{ inform(2794) health(2639) internet(1427) }
{ system(1976) rule(880) can(841) }
{ measur(2081) correl(1212) valu(896) }
{ imag(1057) registr(996) error(939) }
{ imag(2830) propos(1344) filter(1198) }
{ network(2748) neural(1063) input(814) }
{ imag(2675) segment(2577) method(1081) }
{ patient(2315) diseas(1263) diabet(1191) }
{ take(945) account(800) differ(722) }
{ studi(2440) review(1878) systemat(933) }
{ motion(1329) object(1292) video(1091) }
{ assess(1506) score(1403) qualiti(1306) }
{ treatment(1704) effect(941) patient(846) }
{ surgeri(1148) surgic(1085) robot(1054) }
{ concept(1167) ontolog(924) domain(897) }
{ clinic(1479) use(1117) guidelin(835) }
{ data(1714) softwar(1251) tool(1186) }
{ design(1359) user(1324) use(1319) }
{ control(1307) perform(991) simul(935) }
{ model(2220) cell(1177) simul(1124) }
{ care(1570) inform(1187) nurs(1089) }
{ search(2224) databas(1162) retriev(909) }
{ case(1353) use(1143) diagnosi(1136) }
{ studi(1410) differ(1259) use(1210) }
{ risk(3053) factor(974) diseas(938) }
{ research(1085) discuss(1038) issu(1018) }
{ system(1050) medic(1026) inform(1018) }
{ import(1318) role(1303) understand(862) }
{ model(2341) predict(2261) use(1141) }
{ visual(1396) interact(850) tool(830) }
{ compound(1573) activ(1297) structur(1058) }
{ blood(1257) pressur(1144) flow(957) }
{ spatial(1525) area(1432) region(1030) }
{ record(1888) medic(1808) patient(1693) }
{ health(3367) inform(1360) care(1135) }
{ model(3480) simul(1196) paramet(876) }
{ ehr(2073) health(1662) electron(1139) }
{ data(2317) use(1299) case(1017) }
{ age(1611) year(1155) adult(843) }
{ medic(1828) order(1363) alert(1069) }
{ signal(2180) analysi(812) frequenc(800) }
{ cost(1906) reduc(1198) effect(832) }
{ group(2977) signific(1463) compar(1072) }
{ gene(2352) biolog(1181) express(1162) }
{ data(3008) multipl(1320) sourc(1022) }
{ first(2504) two(1366) second(1323) }
{ intervent(3218) particip(2042) group(1664) }
{ can(981) present(881) function(850) }
{ analysi(2126) use(1163) compon(1037) }
{ health(1844) social(1437) communiti(874) }
{ cancer(2502) breast(956) screen(824) }
{ use(976) code(926) identifi(902) }
{ use(1733) differ(960) four(931) }
{ drug(1928) target(777) effect(648) }
{ result(1111) use(1088) new(759) }
{ survey(1388) particip(1329) question(1065) }
{ estim(2440) model(1874) function(577) }
{ decis(3086) make(1611) patient(1517) }
{ activ(1452) weight(1219) physic(1104) }
{ method(2212) result(1239) propos(1039) }
{ detect(2391) sensit(1101) algorithm(908) }

Resumo

Analysis of DNA sequences isolated directly from the environment, known as metagenomics, produces a large quantity of genome fragments that need to be classified into specific taxa. Most composition-based classification methods use all features instead of a subset of features that may maximize classifier accuracy. We show that feature selection methods can boost performance of taxonomic classifiers. This work proposes three different filter-based feature selection methods that stem from information theory: (1) a technique that combines Kullback-Leibler, Mutual Information, and distance information, (2) a text mining technique, TF-IDF, and (3) minimum redundancy-maximum-relevance (mRMR). The feature selection methods are compared by how well they improve support vector machine classification of genomic reads. Overall, the 6mer mRMR method performs well, especially on the phyla-level. If the number of total features is very large, feature selection becomes difficult because a small subset of features that captures a majority of the data variance is less likely to exist. Therefore, we conclude that there is a trade-off between feature set size and feature selection method to optimize classification performance. For larger feature set sizes, TF-IDF works better for finer-resolutions while mRMR performs the best out of any method for N=6 for all taxonomic levels.

Resumo Limpo

analysi dna sequenc isol direct environ known metagenom produc larg quantiti genom fragment need classifi specif taxa compositionbas classif method use featur instead subset featur may maxim classifi accuraci show featur select method can boost perform taxonom classifi work propos three differ filterbas featur select method stem inform theori techniqu combin kullbackleibl mutual inform distanc inform text mine techniqu tfidf minimum redundancymaximumrelev mrmr featur select method compar well improv support vector machin classif genom read overal mer mrmr method perform well especi phylalevel number total featur larg featur select becom difficult small subset featur captur major data varianc less like exist therefor conclud tradeoff featur set size featur select method optim classif perform larger featur set size tfidf work better finerresolut mrmr perform best method n taxonom level

Resumos Similares

Comput Biol Chem - A novel divide-and-merge classification for high dimensional datasets. ( 0,911817307312659 )
Artif Intell Med - Texture feature ranking with relevance learning to classify interstitial lung disease patterns. ( 0,900470396929134 )
Comput Math Methods Med - Comparison of different EHG feature selection methods for the detection of preterm labor. ( 0,889275858810669 )
Int J Comput Assist Radiol Surg - Building an ensemble system for diagnosing masses in mammograms. ( 0,888140161725067 )
J Med Syst - A robust multi-class feature selection strategy based on Rotation Forest Ensemble algorithm for diagnosis of Erythemato-Squamous diseases. ( 0,881120920823236 )
Comput. Biol. Med. - Contourlet-based mammography mass classification using the SVM family. ( 0,875706358602637 )
J Biomed Inform - Automatic figure classification in bioscience literature. ( 0,875098561357254 )
Comput Math Methods Med - SVM versus MAP on accelerometer data to distinguish among locomotor activities executed at different speeds. ( 0,873986718405546 )
Comput. Biol. Med. - An ensemble system for automatic sleep stage classification using single channel EEG signal. ( 0,873100645179147 )
Comput Biol Chem - Derivation of an artificial gene to improve classification accuracy upon gene selection. ( 0,869585842952387 )
Comput. Biol. Med. - Fast and efficient lung disease classification using hierarchical one-against-all support vector machine and cost-sensitive feature selection. ( 0,86802627674214 )
Comput. Biol. Med. - Disulfide connectivity prediction based on structural information without a prior knowledge of the bonding state of cysteines. ( 0,862710191576304 )
J Med Syst - SVM feature selection based rotation forest ensemble classifiers to improve computer-aided diagnosis of Parkinson disease. ( 0,861554962736223 )
Comput Biol Chem - Compact cancer biomarkers discovery using a swarm intelligence feature selection algorithm. ( 0,861325021426086 )
J Biomed Inform - A fast gene selection method for multi-cancer classification using multiple support vector data description. ( 0,85727120690849 )
Artif Intell Med - Computer-aided diagnosis of pulmonary nodules using a two-step approach for feature selection and classifier ensemble construction. ( 0,8513560432015 )
Comput. Biol. Med. - A novel class dependent feature selection method for cancer biomarker discovery. ( 0,848196632834251 )
IEEE Trans Image Process - A novel technique for subpixel image classification based on support vector machine. ( 0,847908107949029 )
Comput. Biol. Med. - Pairwise FCM based feature weighting for improved classification of vertebral column disorders. ( 0,846973056998543 )
Int J Neural Syst - Single-trial motor imagery classification using asymmetry ratio, phase relation, wavelet-based fractal, and their selected combination. ( 0,846765470377177 )
Comput Biol Chem - CE-PLoc: an ensemble classifier for predicting protein subcellular locations by fusing different modes of pseudo amino acid composition. ( 0,846418997405737 )
Brief. Bioinformatics - Class-imbalanced classifiers for high-dimensional data. ( 0,845496141388459 )
Artif Intell Med - An intelligent classifier for prognosis of cardiac resynchronization therapy based on speckle-tracking echocardiograms. ( 0,844741573850229 )
Comput. Biol. Med. - Heartbeat classification using disease-specific feature selection. ( 0,84351376815661 )
Comput. Biol. Med. - Classification of EMG signals using PSO optimized SVM for diagnosis of neuromuscular disorders. ( 0,843103108257252 )
J Chem Inf Model - Classifying molecules using a sparse probabilistic kernel binary classifier. ( 0,842478022724845 )
J Am Med Inform Assoc - Learning regular expressions for clinical text classification. ( 0,8397510058164 )
Comput Methods Programs Biomed - A new hybrid intelligent system for accurate detection of Parkinson's disease. ( 0,835541329351814 )
J Med Syst - A comparative study on classification of sleep stage based on EEG signals using feature selection and classification algorithms. ( 0,833300279230874 )
Comput Math Methods Med - Discrimination between Alzheimer's disease and mild cognitive impairment using SOM and PSO-SVM. ( 0,832007030806333 )
Comput Methods Programs Biomed - A random forest classifier for lymph diseases. ( 0,831947164932516 )
J Med Syst - Enhanced cancer recognition system based on random forests feature elimination algorithm. ( 0,831557535056621 )
Comput Methods Programs Biomed - Automatic cervical cell segmentation and classification in Pap smears. ( 0,831122579217329 )
J Am Med Inform Assoc - A comparative analysis of methods for predicting clinical outcomes using high-dimensional genomic datasets. ( 0,82729713710967 )
Int J Comput Assist Radiol Surg - Multimodality GPU-based computer-assisted diagnosis of breast cancer using ultrasound and digital mammography images. ( 0,82561090235936 )
J Am Med Inform Assoc - Influenza detection from emergency department reports using natural language processing and Bayesian network classifiers. ( 0,821473450695077 )
IEEE J Biomed Health Inform - Recognizing common CT imaging signs of lung diseases through a new feature selection method based on Fisher criterion and genetic optimization. ( 0,82026228366311 )
IEEE J Biomed Health Inform - Automatic detection of atrial fibrillation in cardiac vibration signals. ( 0,818318894224507 )
J Med Syst - An intelligent system for lung cancer diagnosis using a new genetic algorithm based feature selection method. ( 0,815621238089472 )
J Integr Bioinform - On the parameter optimization of Support Vector Machines for binary classification. ( 0,81168821892686 )
J Med Syst - A new expert system for diagnosis of lung cancer: GDA-LS_SVM. ( 0,809524332333552 )
Brief. Bioinformatics - Ensemble learning algorithms for classification of mtDNA into haplogroups. ( 0,809300852279757 )
Comput Methods Programs Biomed - Hepatitis disease diagnosis using a novel hybrid method based on support vector machine and simulated annealing (SVM-SA). ( 0,809096646250289 )
Comput Math Methods Med - An ensemble-of-classifiers based approach for early diagnosis of Alzheimer's disease: classification using structural features of brain images. ( 0,808115746189504 )
Comput. Biol. Med. - SVM-based feature selection to optimize sensitivity-specificity balance applied to weaning. ( 0,806972282045926 )
Comput Math Methods Med - Feature selection in classification of eye movements using electrooculography for activity recognition. ( 0,805054160783151 )
Int J Neural Syst - Extraction of neural control commands using myoelectric pattern recognition: a novel application in adults with cerebral palsy. ( 0,803827258300303 )
J Med Syst - Symptomatic vs. asymptomatic plaque classification in carotid ultrasound. ( 0,798463574749654 )
J Med Syst - A three-stage expert system based on support vector machines for thyroid disease diagnosis. ( 0,794650931366162 )
J Med Syst - Classification of speech dysfluencies using LPC based parameterization techniques. ( 0,793962653632511 )
Med Biol Eng Comput - Wavelet-based sparse functional linear model with applications to EEGs seizure detection and epilepsy diagnosis. ( 0,793075020998701 )
Artif Intell Med - Selective voting in convex-hull ensembles improves classification accuracy. ( 0,792451431119206 )
J Med Syst - Detection of carotid artery disease by using Learning Vector Quantization Neural Network. ( 0,790841492828803 )
Comput. Biol. Med. - Ant colony optimization-based feature selection method for surface electromyography signals classification. ( 0,786375033440993 )
Int J Neural Syst - Improved adaptive splitting and selection: the hybrid training method of a classifier based on a feature space partitioning. ( 0,785869874103516 )
IEEE Trans Image Process - Efficient HIK SVM learning for image classification. ( 0,784810403951696 )
Comput Methods Programs Biomed - Understanding symptomatology of atherosclerotic plaque by image-based tissue characterization. ( 0,784418825297256 )
Int J Neural Syst - Assessment of feature selection and classification approaches to enhance information from overnight oximetry in the context of apnea diagnosis. ( 0,783560041012034 )
Int J Neural Syst - Combination of heterogeneous EEG feature extraction methods and stacked sequential learning for sleep stage classification. ( 0,783448807341994 )
Comput. Biol. Med. - A new dataset evaluation method based on category overlap. ( 0,783138440896667 )
Neural Comput - An Infomax algorithm can perform both familiarity discrimination and feature extraction in a single network. ( 0,782682932743567 )
IEEE Trans Image Process - Walsh-Hadamard transform kernel-based feature vector for shot boundary detection. ( 0,781158204240301 )
Comput. Biol. Med. - Decision forest for classification of gene expression data. ( 0,781032362831738 )
Comput. Biol. Med. - A hybrid feature selection method for DNA microarray data. ( 0,780477569195246 )
Comput. Biol. Med. - A classification system based on a new wrapper feature selection algorithm for the diagnosis of primary and secondary polycythemia. ( 0,780226069830853 )
Comput Methods Programs Biomed - Supervised hybrid feature selection based on PSO and rough sets for medical diagnosis. ( 0,779673283617973 )
J Med Syst - Similarity-dissimilarity plot for visualization of high dimensional data in biomedical pattern classification. ( 0,777358024147961 )
J Chem Inf Model - Choosing feature selection and learning algorithms in QSAR. ( 0,776190428259155 )
Artif Intell Med - Selection of effective features for ECG beat recognition based on nonlinear correlations. ( 0,773341042764504 )
IEEE J Biomed Health Inform - Computer-aided diagnosis in hysteroscopic imaging. ( 0,773248807595986 )
Comput Methods Programs Biomed - Complex extreme learning machine applications in terahertz pulsed signals feature sets. ( 0,772858884217683 )
Comput Math Methods Med - Comparison of two methods forecasting binding rate of plasma protein. ( 0,772060527287487 )
J Am Med Inform Assoc - N-gram support vector machines for scalable procedure and diagnosis classification, with applications to clinical free text data from the intensive care unit. ( 0,771866416511606 )
Artif Intell Med - Classification of small lesions on dynamic breast MRI: Integrating dimension reduction and out-of-sample extension into CADx methodology. ( 0,770324925230792 )
J Med Syst - Statistical analysis of textural features for improved classification of oral histopathological images. ( 0,769881559298588 )
Comput. Biol. Med. - A new feature extraction framework based on wavelets for breast cancer diagnosis. ( 0,768785802352992 )
J Biomed Inform - An efficient statistical feature selection approach for classification of gene expression data. ( 0,766950870988129 )
Artif Intell Med - Electrocardiogram analysis using a combination of statistical, geometric, and nonlinear heart rate variability features. ( 0,766549720805357 )
Comput. Biol. Med. - Ensemble selection for feature-based classification of diabetic maculopathy images. ( 0,765997950783094 )
J Med Syst - Automated diagnosis of Alzheimer disease using the scale-invariant feature transforms in magnetic resonance images. ( 0,76520562434829 )
J Biomed Inform - A biological continuum based approach for efficient clinical classification. ( 0,7646345497866 )
J Biomed Inform - Boosting performance of gene mention tagging system by hybrid methods. ( 0,764320891800764 )
J Chem Inf Model - Classifier ensemble based on feature selection and diversity measures for predicting the affinity of A(2B) adenosine receptor antagonists. ( 0,762383095206089 )
Comput Biol Chem - newDNA-Prot: Prediction of DNA-binding proteins by employing support vector machine and a comprehensive sequence representation. ( 0,760887735272703 )
Comput. Biol. Med. - Gene expression microarray classification using PCA-BEL. ( 0,760465701498225 )
Comput Math Methods Med - Mixed-norm regularization for brain decoding. ( 0,759262989483484 )
BMC Med Inform Decis Mak - Application of support vector machine modeling for prediction of common diseases: the case of diabetes and pre-diabetes. ( 0,759143046225236 )
IEEE Trans Image Process - Human detection in images via piecewise linear support vector machines. ( 0,758572783166894 )
Comput Methods Programs Biomed - Performance comparison of machine learning methods for prognosis of hormone receptor status in breast cancer tissue samples. ( 0,758567726264255 )
J Med Syst - Classification of normal and diseased liver shapes based on Spherical Harmonics coefficients. ( 0,758138794127617 )
Artif Intell Med - Improving the accuracy of suicide attempter classification. ( 0,757306374685575 )
Comput Methods Programs Biomed - Functional activity maps based on significance measures and Independent Component Analysis. ( 0,756609624191102 )
IEEE Trans Image Process - Maximum Margin Correlation Filter: a new approach for localization and classification. ( 0,755479125696715 )
Neural Comput - The support feature machine: classification with the least number of features and application to neuroimaging data. ( 0,75307423724297 )
J Med Syst - Down syndrome diagnosis based on Gabor Wavelet Transform. ( 0,751661782439116 )
Comput Biol Chem - A protein fold classifier formed by fusing different modes of pseudo amino acid composition via PSSM. ( 0,750245728258015 )
IEEE J Biomed Health Inform - Support vector machine classification based on correlation prototypes applied to bone age assessment. ( 0,749039067605309 )
Comput Math Methods Med - Principal feature analysis: a multivariate feature selection method for fMRI data. ( 0,748168523719145 )
Int J Comput Assist Radiol Surg - Degree of contribution (DoC) feature selection algorithm for structural brain MRI volumetric features in depression detection. ( 0,745306015601005 )
Comput Methods Programs Biomed - An improved method of early diagnosis of smoking-induced respiratory changes using machine learning algorithms. ( 0,744955569155437 )