J Biomed Inform - A simulation to analyze feature selection methods utilizing gene ontology for gene expression classification.

Tópicos

{ featur(3375) classif(2383) classifi(1994) }
{ gene(2352) biolog(1181) express(1162) }
{ chang(1828) time(1643) increas(1301) }
{ concept(1167) ontolog(924) domain(897) }
{ general(901) number(790) one(736) }
{ perform(999) metric(946) measur(919) }
{ learn(2355) train(1041) set(1003) }
{ monitor(1329) mobil(1314) devic(1160) }
{ sampl(1606) size(1419) use(1276) }
{ model(3480) simul(1196) paramet(876) }
{ high(1669) rate(1365) level(1280) }
{ measur(2081) correl(1212) valu(896) }
{ model(2220) cell(1177) simul(1124) }
{ intervent(3218) particip(2042) group(1664) }
{ can(774) often(719) complex(702) }
{ method(1219) similar(1157) match(930) }
{ studi(2440) review(1878) systemat(933) }
{ studi(1119) effect(1106) posit(819) }
{ activ(1138) subject(705) human(624) }
{ treatment(1704) effect(941) patient(846) }
{ howev(809) still(633) remain(590) }
{ system(1050) medic(1026) inform(1018) }
{ signal(2180) analysi(812) frequenc(800) }
{ can(981) present(881) function(850) }
{ implement(1333) system(1263) develop(1122) }
{ survey(1388) particip(1329) question(1065) }
{ estim(2440) model(1874) function(577) }
{ method(2212) result(1239) propos(1039) }
{ system(1976) rule(880) can(841) }
{ imag(2830) propos(1344) filter(1198) }
{ imag(2675) segment(2577) method(1081) }
{ patient(2315) diseas(1263) diabet(1191) }
{ surgeri(1148) surgic(1085) robot(1054) }
{ clinic(1479) use(1117) guidelin(835) }
{ algorithm(1844) comput(1787) effici(935) }
{ extract(1171) text(1153) clinic(932) }
{ method(1557) propos(1049) approach(1037) }
{ method(984) reconstruct(947) comput(926) }
{ case(1353) use(1143) diagnosi(1136) }
{ data(3963) clinic(1234) research(1004) }
{ risk(3053) factor(974) diseas(938) }
{ spatial(1525) area(1432) region(1030) }
{ medic(1828) order(1363) alert(1069) }
{ group(2977) signific(1463) compar(1072) }
{ patient(1821) servic(1111) care(1106) }
{ health(1844) social(1437) communiti(874) }
{ structur(1116) can(940) graph(676) }
{ cancer(2502) breast(956) screen(824) }
{ drug(1928) target(777) effect(648) }
{ result(1111) use(1088) new(759) }
{ method(1969) cluster(1462) data(1082) }
{ model(3404) distribut(989) bayesian(671) }
{ imag(1947) propos(1133) code(1026) }
{ data(1737) use(1416) pattern(1282) }
{ inform(2794) health(2639) internet(1427) }
{ imag(1057) registr(996) error(939) }
{ bind(1733) structur(1185) ligand(1036) }
{ sequenc(1873) structur(1644) protein(1328) }
{ network(2748) neural(1063) input(814) }
{ take(945) account(800) differ(722) }
{ motion(1329) object(1292) video(1091) }
{ assess(1506) score(1403) qualiti(1306) }
{ framework(1458) process(801) describ(734) }
{ problem(2511) optim(1539) algorithm(950) }
{ error(1145) method(1030) estim(1020) }
{ data(1714) softwar(1251) tool(1186) }
{ design(1359) user(1324) use(1319) }
{ control(1307) perform(991) simul(935) }
{ care(1570) inform(1187) nurs(1089) }
{ search(2224) databas(1162) retriev(909) }
{ featur(1941) imag(1645) propos(1176) }
{ studi(1410) differ(1259) use(1210) }
{ research(1085) discuss(1038) issu(1018) }
{ import(1318) role(1303) understand(862) }
{ model(2341) predict(2261) use(1141) }
{ visual(1396) interact(850) tool(830) }
{ compound(1573) activ(1297) structur(1058) }
{ perform(1367) use(1326) method(1137) }
{ blood(1257) pressur(1144) flow(957) }
{ record(1888) medic(1808) patient(1693) }
{ health(3367) inform(1360) care(1135) }
{ ehr(2073) health(1662) electron(1139) }
{ state(1844) use(1261) util(961) }
{ research(1218) medic(880) student(794) }
{ patient(2837) hospit(1953) medic(668) }
{ model(2656) set(1616) predict(1553) }
{ data(2317) use(1299) case(1017) }
{ age(1611) year(1155) adult(843) }
{ cost(1906) reduc(1198) effect(832) }
{ data(3008) multipl(1320) sourc(1022) }
{ first(2504) two(1366) second(1323) }
{ time(1939) patient(1703) rate(768) }
{ use(2086) technolog(871) perceiv(783) }
{ analysi(2126) use(1163) compon(1037) }
{ use(976) code(926) identifi(902) }
{ use(1733) differ(960) four(931) }
{ decis(3086) make(1611) patient(1517) }
{ process(1125) use(805) approach(778) }
{ activ(1452) weight(1219) physic(1104) }
{ detect(2391) sensit(1101) algorithm(908) }

Resumo

Gene expression profile classification is a pivotal research domain assisting in the transformation from traditional to personalized medicine. A major challenge associated with gene expression data classification is the small number of samples relative to the large number of genes. To address this problem, researchers have devised various feature selection algorithms to reduce the number of genes. Recent studies have been experimenting with the use of semantic similarity between genes in Gene Ontology (GO) as a method to improve feature selection. While there are few studies that discuss how to use GO for feature selection, there is no simulation study that addresses when to use GO-based feature selection. To investigate this, we developed a novel simulation, which generates binary class datasets, where the differentially expressed genes between two classes have some underlying relationship in GO. This allows us to investigate the effects of various factors such as the relative connectedness of the underlying genes in GO, the mean magnitude of separation between differentially expressed genes denoted by d, and the number of training samples. Our simulation results suggest that the connectedness in GO of the differentially expressed genes for a biological condition is the primary factor for determining the efficacy of GO-based feature selection. In particular, as the connectedness of differentially expressed genes increases, the classification accuracy improvement increases. To quantify this notion of connectedness, we defined a measure called Biological Condition Annotation Level BCAL(G), where G is a graph of differentially expressed genes. Our main conclusions with respect to GO-based feature selection are the following: (1) it increases classification accuracy when BCAL(G) = 0.696; (2) it decreases classification accuracy when BCAL(G) = 0.389; (3) it provides marginal accuracy improvement when 0.389<BCAL(G)<0.696 and d<1; (4) as the number of genes in a biological condition increases beyond 50 and d = 0.7, the improvement from GO-based feature selection decreases; and (5) we recommend not using GO-based feature selection when a biological condition has less than ten genes. Our results are derived from datasets preprocessed using RMA (Robust Multi-array Average), cases where d is between 0.3 and 2.5, and training sample sizes between 20 and 200, therefore our conclusions are limited to these specifications. Overall, this simulation is innovative and addresses the question of when SoFoCles-style feature selection should be used for classification instead of statistical-based ranking measures.

Resumo Limpo

gene express profil classif pivot research domain assist transform tradit person medicin major challeng associ gene express data classif small number sampl relat larg number gene address problem research devis various featur select algorithm reduc number gene recent studi experi use semant similar gene gene ontolog go method improv featur select studi discuss use go featur select simul studi address use gobas featur select investig develop novel simul generat binari class dataset differenti express gene two class under relationship go allow us investig effect various factor relat connected under gene go mean magnitud separ differenti express gene denot d number train sampl simul result suggest connected go differenti express gene biolog condit primari factor determin efficaci gobas featur select particular connected differenti express gene increas classif accuraci improv increas quantifi notion connected defin measur call biolog condit annot level bcalg g graph differenti express gene main conclus respect gobas featur select follow increas classif accuraci bcalg decreas classif accuraci bcalg provid margin accuraci improv bcalg d number gene biolog condit increas beyond d improv gobas featur select decreas recommend use gobas featur select biolog condit less ten gene result deriv dataset preprocess use rma robust multiarray averag case d train sampl size therefor conclus limit specif overal simul innov address question sofoclesstyl featur select use classif instead statisticalbas rank measur

Resumos Similares

Comput. Biol. Med. - Computerized system for recognition of autism on the basis of gene expression microarray data. ( 0,813962286524056 )
J Biomed Inform - An efficient statistical feature selection approach for classification of gene expression data. ( 0,810177482874364 )
J. Comput. Biol. - A hybrid BPSO-CGA approach for gene selection and classification of microarray data. ( 0,807971056867272 )
Comput. Biol. Med. - A hybrid feature selection method for DNA microarray data. ( 0,792571961345386 )
Comput. Biol. Med. - Exploring correlations in gene expression microarray data for maximum predictive-minimum redundancy biomarker selection and classification. ( 0,786956371144844 )
Comput. Biol. Med. - A method of tumor classification based on wavelet packet transforms and neighborhood rough set. ( 0,778377837068589 )
J Biomed Inform - A fast gene selection method for multi-cancer classification using multiple support vector data description. ( 0,769939838757036 )
Methods Inf Med - Correlation-based gene selection and classification using Taguchi-BPSO. ( 0,768204043678124 )
Comput. Biol. Med. - A novel class dependent feature selection method for cancer biomarker discovery. ( 0,766638584684096 )
Artif Intell Med - Self-focusing therapeutic gene delivery with intelligent gene vector swarms: intra-swarm signalling through receptor transgene expression in targeted cells. ( 0,761807042877146 )
Comput. Biol. Med. - Gene expression data classification using locally linear discriminant embedding. ( 0,759326162984924 )
Comput Math Methods Med - Recursive feature selection with significant variables of support vectors. ( 0,754008490505172 )
Comput. Biol. Med. - Decision forest for classification of gene expression data. ( 0,748003563927763 )
Artif Intell Med - Selective voting in convex-hull ensembles improves classification accuracy. ( 0,742237364023924 )
Comput Methods Programs Biomed - TC-VGC: a tumor classification system using variations in genes' correlation. ( 0,741517185902629 )
J. Comput. Biol. - Algorithms for MDC-based multi-locus phylogeny inference: beyond rooted binary gene trees on single alleles. ( 0,739093318428937 )
J Biomed Inform - Gene pathways and subnetworks distinguish between major glioma subtypes and elucidate potential underlying biology. ( 0,73646932126753 )
Methods Inf Med - High-content analysis in monastrol suppressor screens. A neural network-based classification approach. ( 0,732852525553794 )
Comput Biol Chem - Derivation of an artificial gene to improve classification accuracy upon gene selection. ( 0,730122955321087 )
Comput Math Methods Med - A novel weighted support vector machine based on particle swarm optimization for gene selection and tumor classification. ( 0,71885890785192 )
IEEE J Biomed Health Inform - Exploring robust diagnostic signatures for cutaneous melanoma utilizing genetic and imaging data. ( 0,715252061378083 )
Artif Intell Med - Classification of cancer cell death with spectral dimensionality reduction and generalized eigenvalues. ( 0,712677537554652 )
Comput. Biol. Med. - An ensemble of SVM classifiers based on gene pairs. ( 0,708530116066448 )
J Biomed Inform - Selecting significant genes by randomization test for cancer classification using gene expression data. ( 0,704690754634628 )
Comput. Biol. Med. - An experimental comparison of gene selection by Lasso and Dantzig selector for cancer classification. ( 0,703783319333651 )
AMIA Annu Symp Proc - Towards mechanism classifiers: expression-anchored Gene Ontology signature predicts clinical outcome in lung adenocarcinoma patients. ( 0,702472082942257 )
Comput. Biol. Med. - Sparse maximum margin discriminant analysis for feature extraction and gene selection on gene expression data. ( 0,701052490562527 )
Comput Methods Programs Biomed - Connecting high-dimensional mRNA and miRNA expression data for binary medical classification problems. ( 0,698119925086037 )
J Med Syst - Computer vision approach to morphometric feature analysis of basal cell nuclei for evaluating malignant potentiality of oral submucous fibrosis. ( 0,696946804120451 )
J. Comput. Biol. - Finding alternative expression quantitative trait loci by exploring sparse model space. ( 0,690129472709043 )
Artif Intell Med - Identifying a small set of marker genes using minimum expected cost of misclassification. ( 0,68621371507139 )
J. Comput. Biol. - Biomarker discovery using statistically significant gene sets. ( 0,682590227005857 )
Comput Methods Programs Biomed - A new hybrid intelligent system for accurate detection of Parkinson's disease. ( 0,682456207213681 )
Comput Methods Programs Biomed - Hepatitis disease diagnosis using a novel hybrid method based on support vector machine and simulated annealing (SVM-SA). ( 0,676011084810036 )
Comput Biol Chem - A computational method of predicting regulatory interactions in Arabidopsis based on gene expression data and sequence information. ( 0,670329447028985 )
Comput Biol Chem - Information-theoretic approaches to SVM feature selection for metagenome read classification. ( 0,669685680359862 )
Comput Biol Chem - Compact cancer biomarkers discovery using a swarm intelligence feature selection algorithm. ( 0,668919703477691 )
AMIA Annu Symp Proc - Mining disease fingerprints from within genetic pathways. ( 0,668106502995143 )
Comput. Biol. Med. - Neural system for heartbeats recognition using genetically integrated ensemble of classifiers. ( 0,666148222956036 )
Artif Intell Med - Texture feature ranking with relevance learning to classify interstitial lung disease patterns. ( 0,663837020399638 )
Brief. Bioinformatics - Combining multidimensional genomic measurements for predicting cancer prognosis: observations from TCGA. ( 0,663389495228997 )
J Am Med Inform Assoc - Complex-disease networks of trait-associated single-nucleotide polymorphisms (SNPs) unveiled by information theory. ( 0,66227163809015 )
Comput Biol Chem - Revealing weak differential gene expressions and their reproducible functions associated with breast cancer metastasis. ( 0,660914332493196 )
Comput Biol Chem - Multi objective SNP selection using pareto optimality. ( 0,655867725836963 )
Sci Data - Assessment of lipidomic species in hepatocyte lipid droplets from stressed mouse models. ( 0,655809959478027 )
Brief. Bioinformatics - Class-imbalanced classifiers for high-dimensional data. ( 0,655494234705537 )
IEEE J Biomed Health Inform - Using evolutional properties of gene networks in understanding survival prognosis of glioblastoma. ( 0,654772164759512 )
J Am Med Inform Assoc - 'N-of-1-pathways' unveils personal deregulated mechanisms from a single pair of RNA-Seq samples: towards precision medicine. ( 0,652742988469951 )
Neural Comput - An Infomax algorithm can perform both familiarity discrimination and feature extraction in a single network. ( 0,649033361521676 )
Comput Math Methods Med - Multiple suboptimal solutions for prediction rules in gene expression data. ( 0,648933399196702 )
Comput Methods Programs Biomed - Ensemble transcript interaction networks: a case study on Alzheimer's disease. ( 0,647001109884007 )
J Integr Bioinform - Network expansion and pathway enrichment analysis towards biologically significant findings from microarrays. ( 0,646968713986025 )
J Biomed Inform - Systems-based biological concordance and predictive reproducibility of gene set discovery methods in cardiovascular disease. ( 0,646587558118988 )
Comput. Biol. Med. - Impact of TGF-b on breast cancer from a quantitative proteomic analysis. ( 0,646122542870822 )
Comput Math Methods Med - Genomic and functional analysis of the toxic effect of tachyplesin I on the embryonic development of zebrafish. ( 0,645021757767416 )
Comput Math Methods Med - SVM versus MAP on accelerometer data to distinguish among locomotor activities executed at different speeds. ( 0,644711020432774 )
Comput Math Methods Med - Discrimination between Alzheimer's disease and mild cognitive impairment using SOM and PSO-SVM. ( 0,644653673074896 )
Artif Intell Med - Subpopulation-specific confidence designation for more informative biomedical classification. ( 0,644179290073066 )
J Med Syst - SVM feature selection based rotation forest ensemble classifiers to improve computer-aided diagnosis of Parkinson disease. ( 0,642259536801224 )
Comput Biol Chem - A novel divide-and-merge classification for high dimensional datasets. ( 0,641652381548225 )
Wiley Interdiscip Rev Syst Biol Med - Noncoding RNAs in gene regulation. ( 0,641313949707592 )
Comput Math Methods Med - Comparison of different EHG feature selection methods for the detection of preterm labor. ( 0,640862879439823 )
Brief. Bioinformatics - Ensemble learning algorithms for classification of mtDNA into haplogroups. ( 0,640790713045326 )
Wiley Interdiscip Rev Syst Biol Med - Using a systems biology approach to understand and study the mechanisms of metastasis. ( 0,640544694584064 )
Comput Biol Chem - Expression patterns of photoperiod and temperature regulated heading date genes in Oryza sativa. ( 0,639387716296549 )
Comput. Biol. Med. - Prediction of microRNA-regulated protein interaction pathways in Arabidopsis using machine learning algorithms. ( 0,636849842546341 )
Brief. Bioinformatics - Revealing the architecture of genetic and epigenetic regulation: a maximum likelihood model. ( 0,633523151654396 )
Comput. Biol. Med. - Fast and efficient lung disease classification using hierarchical one-against-all support vector machine and cost-sensitive feature selection. ( 0,633314946672008 )
Comput. Biol. Med. - Classification of EMG signals using PSO optimized SVM for diagnosis of neuromuscular disorders. ( 0,633278579727436 )
Comput Biol Chem - Disruption of murine Tcte3-3 induces tissue specific apoptosis via co-expression of Anxa5 and Pebp1. ( 0,632705963989822 )
Comput. Biol. Med. - Computational gene network study on antibiotic resistance genes of Acinetobacter baumannii. ( 0,631872114206661 )
Comput. Biol. Med. - An ensemble system for automatic sleep stage classification using single channel EEG signal. ( 0,630152979529893 )
J Am Med Inform Assoc - Network models of genome-wide association studies uncover the topological centrality of protein interactions in complex diseases. ( 0,630061723355417 )
Comput. Biol. Med. - Degrees of separation as a statistical tool for evaluating candidate genes. ( 0,628787185602379 )
Comput Biol Chem - Statistical analysis of combinatorial transcriptional regulatory motifs in human intron-containing promoter sequences. ( 0,626881016379846 )
J Biomed Inform - Prioritization of potential candidate disease genes by topological similarity of protein-protein interaction network and phenotype data. ( 0,626263583333264 )
Comput. Biol. Med. - A supervised orthogonal discriminant projection for tumor classification using gene expression data. ( 0,626115043823059 )
J Integr Bioinform - Towards prediction and prioritization of disease genes by the modularity of human phenome-genome assembled network. ( 0,625965234956933 )
J Integr Bioinform - On the parameter optimization of Support Vector Machines for binary classification. ( 0,625007812646487 )
J Am Med Inform Assoc - Extracting coordinated patterns of DNA methylation and gene expression in ovarian cancer. ( 0,624577493268426 )
Int J Comput Assist Radiol Surg - Building an ensemble system for diagnosing masses in mammograms. ( 0,624229509475136 )
Comput Math Methods Med - First comprehensive in silico analysis of the functional and structural consequences of SNPs in human GalNAc-T1 gene. ( 0,623810048921737 )
Comput Biol Chem - Identifying novel prostate cancer associated pathways based on integrative microarray data analysis. ( 0,623796789688472 )
J Med Syst - A robust multi-class feature selection strategy based on Rotation Forest Ensemble algorithm for diagnosis of Erythemato-Squamous diseases. ( 0,623415224267042 )
J Am Med Inform Assoc - A comparative analysis of methods for predicting clinical outcomes using high-dimensional genomic datasets. ( 0,62337004606858 )
J Integr Bioinform - An integrative bioinformatics framework for genome-scale multiple level network reconstruction of rice. ( 0,622948359316875 )
Artif Intell Med - Computer-aided diagnosis of pulmonary nodules using a two-step approach for feature selection and classifier ensemble construction. ( 0,622641969602915 )
Comput. Biol. Med. - Pairwise FCM based feature weighting for improved classification of vertebral column disorders. ( 0,621543187792465 )
AMIA Annu Symp Proc - An ontology-neutral framework for enrichment analysis. ( 0,621475757088807 )
Comput. Biol. Med. - Heartbeat classification using disease-specific feature selection. ( 0,621440009602189 )
Brief. Bioinformatics - Extracting reaction networks from databases-opening Pandora's box. ( 0,621375523947791 )
Comput Biol Chem - In silico analysis of cis-acting regulatory elements in 5' regulatory regions of sucrose transporter gene families in rice (Oryza sativa Japonica) and Arabidopsis thaliana. ( 0,621286319254034 )
J Med Syst - A new expert system for diagnosis of lung cancer: GDA-LS_SVM. ( 0,621160939432093 )
Wiley Interdiscip Rev Syst Biol Med - The zebrafish: scalable in vivo modeling for systems biology. ( 0,619443680569837 )
J Biomed Inform - A two step method to identify clinical outcome relevant genes with microarray data. ( 0,619348894784755 )
Wiley Interdiscip Rev Syst Biol Med - Systems biology of adipose tissue metabolism: regulation of growth, signaling and inflammation. ( 0,618566186635872 )
J Am Med Inform Assoc - An integrated approach to identify causal network modules of complex diseases with application to colorectal cancer. ( 0,618473593325966 )
Comput Methods Programs Biomed - Supervised hybrid feature selection based on PSO and rough sets for medical diagnosis. ( 0,617663022382741 )
J Biomed Inform - Automatic figure classification in bioscience literature. ( 0,617141634438354 )
J. Comput. Biol. - An algorithm for efficient identification of branched metabolic pathways. ( 0,616684015217647 )