A novel method for improved accuracy of transcription factor binding site prediction

Abdullah M. Khamis, Olaa Amin Motwalli, Romina Oliva, Boris R. Jankovic, Yulia Medvedeva, Haitham Ashoor, Magbubah Essack, Xin Gao, Vladimir B. Bajic

Research output: Contribution to journalArticlepeer-review

34 Scopus citations

Abstract

Identifying transcription factor (TF) binding sites (TFBSs) is important in the computational inference of gene regulation. Widely used computational methods of TFBS prediction based on position weight matrices (PWMs) usually have high false positive rates. Moreover, computational studies of transcription regulation in eukaryotes frequently require numerous PWM models of TFBSs due to a large number of TFs involved. To overcome these problems we developed DRAF, a novel method for TFBS prediction that requires only 14 prediction models for 232 human TFs, while at the same time significantly improves prediction accuracy. DRAF models use more features than PWM models, as they combine information from TFBS sequences and physicochemical properties of TF DNA-binding domains into machine learning models. Evaluation of DRAF on 98 human ChIP-seq datasets shows on average 1.54-, 1.96- and 5.19-fold reduction of false positives at the same sensitivities compared to models from HOCOMOCO, TRANSFAC and DeepBind, respectively. This observation suggests that one can efficiently replace the PWM models for TFBS prediction by a small number of DRAF models that significantly improve prediction accuracy. The DRAF method is implemented in a web tool and in a stand-alone software freely available at http://cbrc.kaust.edu.sa/DRAF.
Original languageEnglish (US)
Pages (from-to)e72-e72
Number of pages1
JournalNucleic Acids Research
Volume46
Issue number12
DOIs
StatePublished - Apr 2 2018

Bibliographical note

KAUST Repository Item: Exported on 2020-10-01
Acknowledged KAUST grant number(s): BAS/1/1606-01-01, BAS/1/1606-01-01
Acknowledgements: The computational analysis for this study was performed on Dragon and Snapdragon compute clusters of the Computational Bioscience Research Center at KAUST. King Abdullah University of Science and Technology (KAUST) [BAS/1/1606-01-01 to V.B.B.]. Funding for open access charge: KAUST [BAS/1/1606-01-01].

Fingerprint

Dive into the research topics of 'A novel method for improved accuracy of transcription factor binding site prediction'. Together they form a unique fingerprint.

Cite this