Abstract
Genomic data, and more generally biomedical data, are often characterized by high dimensionality. An input selection procedure can attain the two objectives of highlighting the relevant variables (genes) and possibly improving classification results. In this paper, we propose a wrapper approach to gene selection in classification of gene expression data using simulated annealing along with supervised classification. The proposed approach can perform global combinatorial searches through the space of all possible input subsets, can handle cases with numerical, categorical or mixed inputs, and is able to find (sub-)optimal subsets of inputs giving low classification errors. The method has been tested on publicly available bioinformatics data sets using support vector machines and on a mixed type data set using classification trees. We also propose some heuristics able to speed up the convergence. The experimental results highlight the ability of the method to select minimal sets of relevant features.
Original language | English (US) |
---|---|
Pages (from-to) | 1471-1482 |
Number of pages | 12 |
Journal | Soft Computing |
Volume | 15 |
Issue number | 8 |
DOIs | |
State | Published - Aug 2011 |
Keywords
- Classification trees
- DNA microarrays
- Gene selection
- Input selection
- Simulated annealing
- Support vector machines
ASJC Scopus subject areas
- Software
- Theoretical Computer Science
- Geometry and Topology