Predicting Gene Functions and Phenotypes by combining Deep Learning and Ontologies

Student thesis: Doctoral Thesis


The amount of available protein sequences is rapidly increasing, mainly as a consequence of the development and application of high throughput sequencing technologies in the life sciences. It is a key question in the life sciences to identify the functions of proteins, and furthermore to identify the phenotypes that may be associated with a loss (or gain) of function in these proteins. Protein functions are generally determined experimentally, and it is clear that experimental determination of protein functions will not scale to the current { and rapidly increasing { amount of available protein sequences (over 300 million). Furthermore, identifying phenotypes resulting from loss of function is even more challenging as the phenotype is modi ed by whole organism interactions and environmental variables. It is clear that accurate computational prediction of protein functions and loss of function phenotypes would be of signi cant value both to academic research and to the biotechnology industry. We developed and expanded novel methods for representation learning, predicting protein functions and their loss of function phenotypes. We use deep neural network algorithm and combine them with symbolic inference into neural-symbolic algorithms. Our work signi cantly improves previously developed methods for predicting protein functions through methodological advances in machine learning, incorporation of broader data types that may be predictive of functions, and improved systems for neural-symbolic integration. The methods we developed are generic and can be applied to other domains in which similar types of structured and unstructured information exist. In future, our methods can be applied to prediction of protein function for metagenomic samples in order to evaluate the potential for discovery of novel proteins of industrial value. Also our methods can be applied to the prediction of loss of function phenotypes in human genetics and incorporate the results in a variant prioritization tool that can be applied to diagnose patients with Mendelian disorders.
Date of AwardApr 8 2020
Original languageEnglish (US)
Awarding Institution
  • Computer, Electrical and Mathematical Sciences and Engineering
SupervisorRobert Hoehndorf (Supervisor)


  • gene functions
  • phenotypes
  • ontologies
  • embeddings
  • deep neural networks
  • machine learning

Cite this