Abstract
Protein subcellular location prediction, as an important step for the interpretation of protein function and identification of drugs targets, in recent years has been extensively studied. Recent studies have predicted both single-site and multi-site proteins rather than just single-site proteins. Computational methods based on Gene Ontology (GO) have certain advantages. However, we find that there are relationships between GO terms which are ignored by existing GO-based methods. This paper proposed a multi-label subcellular location predictor, namely GS-mPloc, that considers not only GO terms but also the inter-term relationships. This is achieved by using the semantic similarity between GO terms. Given a protein, a set of GO terms are retrieved and thereby a GO feature vector of the protein is produced by searching against the Gene Ontology database. Then the semantic similarity between GO terms is used to improve the original GO features and accordingly obtain a new feature vector. Besides, based on multi-label multi-class support vector machine classification algorithm (ML-SVM) was introduced to the classification of the new feature vector. Experimental results show that the proposed predictor significantly outperforms predictor based on original GO features as well as other state-of-the-art predictors.
Original language | English (US) |
---|---|
Pages (from-to) | 4615-4623 |
Number of pages | 9 |
Journal | Journal of Computational and Theoretical Nanoscience |
Volume | 12 |
Issue number | 11 |
DOIs | |
State | Published - Nov 1 2015 |
Externally published | Yes |
Bibliographical note
Generated from Scopus record by KAUST IRTS on 2023-09-20ASJC Scopus subject areas
- General Materials Science
- Computational Mathematics
- General Chemistry
- Electrical and Electronic Engineering
- Condensed Matter Physics