OncoRTT: Predicting novel oncology-related therapeutic targets using BERT embeddings and omics features

Maha A. Thafar, Somayah Albaradei, Mahmut Uludag, Mona Alshahrani, Takashi Gojobori, Magbubah Essack*, Xin Gao*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

2 Scopus citations


Late-stage drug development failures are usually a consequence of ineffective targets. Thus, proper target identification is needed, which may be possible using computational approaches. The reason being, effective targets have disease-relevant biological functions, and omics data unveil the proteins involved in these functions. Also, properties that favor the existence of binding between drug and target are deducible from the protein’s amino acid sequence. In this work, we developed OncoRTT, a deep learning (DL)-based method for predicting novel therapeutic targets. OncoRTT is designed to reduce suboptimal target selection by identifying novel targets based on features of known effective targets using DL approaches. First, we created the “OncologyTT” datasets, which include genes/proteins associated with ten prevalent cancer types. Then, we generated three sets of features for all genes: omics features, the proteins’ amino-acid sequence BERT embeddings, and the integrated features to train and test the DL classifiers separately. The models achieved high prediction performances in terms of area under the curve (AUC), i.e., AUC greater than 0.88 for all cancer types, with a maximum of 0.95 for leukemia. Also, OncoRTT outperformed the state-of-the-art method using their data in five out of seven cancer types commonly assessed by both methods. Furthermore, OncoRTT predicts novel therapeutic targets using new test data related to the seven cancer types. We further corroborated these results with other validation evidence using the Open Targets Platform and a case study focused on the top-10 predicted therapeutic targets for lung cancer.

Original languageEnglish (US)
Article number1139626
JournalFrontiers in genetics
StatePublished - 2023

Bibliographical note

Funding Information:
The research reported in this publication was supported by King Abdullah University of Science and Technology (KAUST) through grant awards Nos. BAS/1/1059-01-01, BAS/1/1624-01-01, FCC/1/1976-47-01, FCC/1/1976-26-01, URF/1/3450-01-01, REI/1/4216-01-01, REI/1/4437-01-01, REI/1/4473-01-01, and URF/1/4098-01-01.

Publisher Copyright:
Copyright © 2023 Thafar, Albaradei, Uludag, Alshahrani, Gojobori, Essack and Gao.


  • bioinformatics
  • colon cancer
  • deep neural network
  • lung cancer
  • machine learning
  • omics
  • sequence embedding
  • target identification

ASJC Scopus subject areas

  • Molecular Medicine
  • Genetics
  • Genetics(clinical)


Dive into the research topics of 'OncoRTT: Predicting novel oncology-related therapeutic targets using BERT embeddings and omics features'. Together they form a unique fingerprint.

Cite this