TY - JOUR
T1 - DeepGO: predicting protein functions from sequence and interactions using a deep ontology-aware classifier
AU - Kulmanov, Maxat
AU - Khan, Mohammed Asif
AU - Hoehndorf, Robert
N1 - KAUST Repository Item: Exported on 2020-10-01
Acknowledged KAUST grant number(s): FCC/1/1976-08-01
Acknowledgements: This work was supported by funding from King Abdullah University of Science and Technology (KAUST) [FCC/1/1976-08-01]
PY - 2017/9/27
Y1 - 2017/9/27
N2 - Motivation
A large number of protein sequences are becoming available through the application of novel high-throughput sequencing technologies. Experimental functional characterization of these proteins is time-consuming and expensive, and is often only done rigorously for few selected model organisms. Computational function prediction approaches have been suggested to fill this gap. The functions of proteins are classified using the Gene Ontology (GO), which contains over 40 000 classes. Additionally, proteins have multiple functions, making function prediction a large-scale, multi-class, multi-label problem.
Results
We have developed a novel method to predict protein function from sequence. We use deep learning to learn features from protein sequences as well as a cross-species protein–protein interaction network. Our approach specifically outputs information in the structure of the GO and utilizes the dependencies between GO classes as background information to construct a deep learning model. We evaluate our method using the standards established by the Computational Assessment of Function Annotation (CAFA) and demonstrate a significant improvement over baseline methods such as BLAST, in particular for predicting cellular locations.
AB - Motivation
A large number of protein sequences are becoming available through the application of novel high-throughput sequencing technologies. Experimental functional characterization of these proteins is time-consuming and expensive, and is often only done rigorously for few selected model organisms. Computational function prediction approaches have been suggested to fill this gap. The functions of proteins are classified using the Gene Ontology (GO), which contains over 40 000 classes. Additionally, proteins have multiple functions, making function prediction a large-scale, multi-class, multi-label problem.
Results
We have developed a novel method to predict protein function from sequence. We use deep learning to learn features from protein sequences as well as a cross-species protein–protein interaction network. Our approach specifically outputs information in the structure of the GO and utilizes the dependencies between GO classes as background information to construct a deep learning model. We evaluate our method using the standards established by the Computational Assessment of Function Annotation (CAFA) and demonstrate a significant improvement over baseline methods such as BLAST, in particular for predicting cellular locations.
UR - http://hdl.handle.net/10754/625903
UR - https://academic.oup.com/bioinformatics/article/doi/10.1093/bioinformatics/btx624/4265461/DeepGO-predicting-protein-functions-from-sequence
UR - http://www.scopus.com/inward/record.url?scp=85042523872&partnerID=8YFLogxK
U2 - 10.1093/bioinformatics/btx624
DO - 10.1093/bioinformatics/btx624
M3 - Article
C2 - 29028931
SN - 1367-4803
VL - 34
SP - 660
EP - 668
JO - Bioinformatics
JF - Bioinformatics
IS - 4
ER -