In classification problems, it is always important to use the suitable combination of features that will be employed by classifiers. Generating the right combination of features usually results in good classifiers. In the situation when the problem is not well understood, data items are usually described by many features in the hope that some of these may be the relevant or most relevant ones.
In this study, we focus on one such problem related to genes implicated in ovarian cancer (OC). We try to recognize two important OC-related gene groups: oncogenes, which support the development and progression of OC, and oncosuppressors, which oppose such tendencies. For this, we use the properties of promoters of these genes. We identified potential “regulatory features” that characterize OC-related oncogenes and oncosuppressors promoters. In our study, we used 211 oncogenes and 39 oncosuppressors. For these, we identified 538 characteristic sequence motifs from their promoters. Promoters are annotated by these motifs and derived feature vectors used to develop classification models. We made a comparison of a number of classification models in their ability to distinguish oncogenes from oncosuppressors.
Based on 10-fold cross-validation, the resultant model was able to separate the two classes with sensitivity of 96% and specificity of 100% with the complete set of features. Moreover, we developed another recognition model where we attempted to distinguish oncogenes and oncosuppressors as one group from other OC-related genes. That model achieved accuracy of 82%. We believe that the results of this study will help in discovering other OC-related oncogenes and oncosuppressors not identified as yet.
|Date of Award||Dec 2012|
|Original language||English (US)|
- Computer, Electrical and Mathematical Sciences and Engineering
|Supervisor||Vladimir Bajic (Supervisor)|
- machine learning
- ovarian cancer