Abstract
DNA N4-methylcytosine (4mC) and DNA N6-methyladenine (6mA) are significant epigenetic modifications. 4mC is closely related to the restriction modification system, and 6mA has a hand in the process of various cellular activities. In order to further explore their functional mechanisms and biological significance, and to overcome the bottleneck of narrow coverage in traditional experimental methods, it is needed to propose an efficient prediction method with a wide range of applications. In this work, we develop a prediction method named 4mCi6mA-BGC to predict 4mC sites and 6mA sites. First, we employ binary, K-mer nucleotide frequency (K-mer), pseudo K-tuple nucleotide composition (PseKNC), dinucleotide-based auto covariance (DAC) and monoDiKGap theoretical description (MonoDiKGap) to encode DNA sequences. Then, the elastic net is employed for feature selection, and the optimized feature space is put into a deep learning framework composed of bidirectional gated recurrent unit and convolutional neural network. The benchmark datasets include six datasets, which contain 14 328 4mC sites from different species. The results of 10-fold cross-validation indicate that the prediction accuracy significantly outperforms the existing prediction methods. Meanwhile, use independent datasets Rice and Arabidopsis thaliana to further confirm the predictive ability of 4mCi6mA-BGC. Compared with the existing prediction methods, 4mCi6mA-BGC shows the best prediction performance. These comprehensive results indicate that our method can identify DNA modification sites represented by 4mC and 6mA sites.
Original language | English (US) |
---|---|
Pages (from-to) | 103566 |
Journal | Biomedical Signal Processing and Control |
Volume | 75 |
DOIs | |
State | Published - Feb 12 2022 |
Bibliographical note
KAUST Repository Item: Exported on 2022-04-26Acknowledged KAUST grant number(s): FCC/1/1976-18-01, REI/1/4216-01-01, REI/1/4437-01-01, REI/1/4473-01-01, REI/1/4742-01-01, URF/1/4098-01-01, URF/1/4379-01-0
Acknowledgements: We thank anonymous reviewers for valuable suggestions and comments. This work was supported by the National Natural Science Foundation of China (No. 62172248), the Natural Science Foundation of Shandong Province of China (No. ZR2021MF098), and the King Abdullah University of Science and Technology (KAUST) Office of Spon-sored Research (OSR) under award numbers (Nos. FCC/1/1976-18-01, REI/1/4216-01-01, REI/1/4437-01-01, REI/1/4473-01-01, URF/1/4379-01-01, REI/1/4742-01-01 and URF/1/4098-01-01).