RPI-MDLStack: Predicting RNA–protein interactions through deep learning with stacking strategy and LASSO

Bin Yu, Xue Wang, Yaqun Zhang, Hongli Gao, Yifei Wang, Yushuang Liu, Xin Gao

Research output: Contribution to journalArticlepeer-review

21 Scopus citations

Abstract

RNA–protein interactions (RPI) play a crucial role in foundational cellular physiological processes. Traditional methods to predict RPI are implemented through expensive and labor-intensive biological experiments, and existing computational methods are far from being satisfactory. There is a timely need for developing more cost-effective methods to predict RPI. A stacking ensemble deep learning-based framework (named RPI-MDLStack) is constructed for RPI prediction in this study. First, sequential-, physicochemical-, structural- and evolutionary-information from RNA and protein sequences are obtained through eight feature extraction methods. Then, the optimal feature is generated after eliminating the redundancy of the fusion features by the least absolute shrinkage and selection operator (LASSO). Based on the stacking strategy, the optimal feature is first learned by the base-classifier combination composed of multilayer perceptron (MLP), support vector machine (SVM), random forest (RF), gated recurrent unit (GRU), and deep neural networks (DNN). Finally, the prediction scores are fed into a discriminative model for further training. The results of 5-fold cross-validation test prove the superior identification of RPI-MDLStack with accuracy of 96.7%, 87.3%, 94.6%, 97.1% and 89.5% on RPI488, RPI369, RPI2241, RPI1807, and RPI1446, respectively. Additionally, RPI-MDLStack obtained the overall prediction accuracy of 97.8% in the independent tests trained on RPI488. Compared with other state-of-the-art RPI prediction methods using the same datasets, RPI-MDLStack shows more robust and stable for predicting RPI.
Original languageEnglish (US)
Pages (from-to)108676
JournalApplied Soft Computing
Volume120
DOIs
StatePublished - Mar 18 2022

Bibliographical note

KAUST Repository Item: Exported on 2022-04-21
Acknowledged KAUST grant number(s): FCC/1/1976-18-01, REI/1/4216-01-01, REI/1/4473-01-01, REI/1/4742-01-01, URF/1/4098-01-01, URF/1/4379-01-0
Acknowledgements: We thank anonymous reviewers for valuable suggestions and comments. This work was supported by the National Natural Science Foundation of China (No. 62172248), the Natural Science Foundation of Shandong Province of China (No. ZR2021MF098), and the King Abdullah University of Science and Technology (KAUST) Office of Spon-sored Research (OSR) under award numbers (Nos. FCC/1/1976-18-01, REI/1/4216-01-01, REI/1/4437-01-01, REI/1/4473-01-01, URF/1/4379-01-01, REI/1/4742-01-01 and URF/1/4098-01-01)

ASJC Scopus subject areas

  • Software

Fingerprint

Dive into the research topics of 'RPI-MDLStack: Predicting RNA–protein interactions through deep learning with stacking strategy and LASSO'. Together they form a unique fingerprint.

Cite this