Artif Intell Med - Hybrid genetic algorithm-neural network: feature extraction for unpreprocessed microarray data.


JECTIVE: Suitable techniques for microarray analysis have been widely researched, particularly for the study of marker genes expressed to a specific type of cancer. Most of the machine learning methods that have been applied to significant gene selection focus on the classification ability rather than the selection ability of the method. These methods also require the microarray data to be preprocessed before analysis takes place. The objective of this study is to develop a hybrid genetic algorithm-neural network (GANN) model that emphasises feature selection and can operate on unpreprocessed microarray data.METHOD: The GANN is a hybrid model where the fitness value of the genetic algorithm (GA) is based upon the number of samples correctly labelled by a standard feedforward artificial neural network (ANN). The model is evaluated by using two benchmark microarray datasets with different array platforms and differing number of classes (a 2-class oligonucleotide microarray data for acute leukaemia and a 4-class complementary DNA (cDNA) microarray dataset for SRBCTs (small round blue cell tumours)). The underlying concept of the GANN algorithm is to select highly informative genes by co-evolving both the GA fitness function and the ANN weights at the same time.RESULTS: The novel GANN selected approximately 50% of the same genes as the original studies. This may indicate that these common genes are more biologically significant than other genes in the datasets. The remaining 50% of the significant genes identified were used to build predictive models and for both datasets, the models based on the set of genes extracted by the GANN method produced more accurate results. The results also suggest that the GANN method not only can detect genes that are exclusively associated with a single cancer type but can also explore the genes that are differentially expressed in multiple cancer types.CONCLUSIONS: The results show that the GANN model has successfully extracted statistically significant genes from the unpreprocessed microarray data as well as extracting known biologically significant genes. We also show that assessing the biological significance of genes based on classification accuracy may be misleading and though the GANN's set of extra genes prove to be more statistically significant than those selected by other methods, a biological assessment of these genes is highly recommended to confirm their functionality.

Resumo Limpo

jectiv suitabl techniqu microarray analysi wide research particular studi marker gene express specif type cancer machin learn method appli signific gene select focus classif abil rather select abil method method also requir microarray data preprocess analysi take place object studi develop hybrid genet algorithmneur network gann model emphasis featur select can oper unpreprocess microarray datamethod gann hybrid model fit valu genet algorithm ga base upon number sampl correct label standard feedforward artifici neural network ann model evalu use two benchmark microarray dataset differ array platform differ number class class oligonucleotid microarray data acut leukaemia class complementari dna cdna microarray dataset srbcts small round blue cell tumour under concept gann algorithm select high inform gene coevolv ga fit function ann weight timeresult novel gann select approxim gene origin studi may indic common gene biolog signific gene dataset remain signific gene identifi use build predict model dataset model base set gene extract gann method produc accur result result also suggest gann method can detect gene exclus associ singl cancer type can also explor gene differenti express multipl cancer typesconclus result show gann model success extract statist signific gene unpreprocess microarray data well extract known biolog signific gene also show assess biolog signific gene base classif accuraci may mislead though gann set extra gene prove statist signific select method biolog assess gene high recommend confirm function

