DEEPStack-RBP: Accurate identification of RNA-binding proteins based on autoencoder feature selection and deep stacking ensemble classifier

Qinqin Wei, Qingmei Zhang, Hongli Gao, Tao Song, Adil Salhi, Bin Yu

Research output: Contribution to journalArticlepeer-review

Abstract

RNA-binding proteins (RBPs) are involved in a number of biological processes such as RNA synthesis, protein folding, alternative splicing, etc. Predicting RBPs can facilitate the discovery and treatment of human diseases, such as muscle atrophy, nervous system diseases, and cancer. However, there are still various challenges in identifying RBPs using experimental methods. Computational methods, and in particular Deep Learning, are being deployed to alleviate some of these challenges and provide new avenues of investigation in the field of RBPs prediction. Here, we propose DEEPStack-RBP, a novel RBPs prediction tool based on deep learning and ensemble learning. First, conjoint triad (CT), local descriptors (LD), pseudo amino acid composition (PseAAC), multivariate mutual information (MMI) and position specific scoring matrix-transition probability composition (PSSM-TPC) are applied to extract multiple features from the proteins. Subsequently, autoencoder (AE) is used to eliminate redundancy in features, and SMOTE-ENN is employed to balance the samples by minimizing the number difference between positive and negative cases. Finally, the stacked ensemble classifier composed of bidirectional long short-term memory (BiLSTM), gated recurrent unit (GRU), and support vector machine (SVM) is used for prediction. On the training dataset RBP9873, the ACC value of DEEPStack-RBP reaches 98.76% with a MCC value of 0.9508. For the three independent test datasets of Human, S. cerevisiae and A. thaliana, the accuracy of the model is 97.16%, 97.67% and 99.57% respectively, and the MCC is 0.9405, 0.9499 and 0.9906 respectively. These results show that DEEPStack-RBP can be used as a powerful tool for RBPs prediction.
Original languageEnglish (US)
Pages (from-to)109875
JournalKnowledge-Based Systems
Volume256
DOIs
StatePublished - Sep 23 2022

Bibliographical note

KAUST Repository Item: Exported on 2022-10-03
Acknowledgements: We thank anonymous reviewers for valuable suggestions and comments. This work was supported by the National Natural Science Foundation of China (Nos. 62172248, 61863010), the Natural Science Foundation of Shandong Province of China (No. ZR2021MF098), and the Key Laboratory Open Foundation of Hainan Province, China (No. JSKX202001).

ASJC Scopus subject areas

  • Management Information Systems
  • Artificial Intelligence
  • Software
  • Information Systems and Management

Fingerprint

Dive into the research topics of 'DEEPStack-RBP: Accurate identification of RNA-binding proteins based on autoencoder feature selection and deep stacking ensemble classifier'. Together they form a unique fingerprint.

Cite this