Abstract
Our current knowledge of eukaryotic promoters indicates their complex architecture that is often composed of numerous functional motifs. Most of known promoters include multiple and in some cases mutually exclusive transcription start sites (TSSs). Moreover, TSS selection depends on cell/tissue, development stage and environmental conditions. Such complex promoter structures make their computational identification notoriously difficult. Here, we present TSSPlant, a novel tool that predicts both TATA and TATA-less promoters in sequences of a wide spectrum of plant genomes. The tool was developed by using large promoter collections from ppdb and PlantProm DB. It utilizes eighteen significant compositional and signal features of plant promoter sequences selected in this study, that feed the artificial neural network-based model trained by the backpropagation algorithm. TSSPlant achieves significantly higher accuracy compared to the next best promoter prediction program for both TATA promoters (MCC≃0.84 and F1-score≃0.91 versus MCC≃0.51 and F1-score≃0.71) and TATA-less promoters (MCC≃0.80, F1-score≃0.89 versus MCC≃0.29 and F1-score≃0.50). TSSPlant is available to download as a standalone program at http://www.cbrc.kaust.edu.sa/download/.
Original language | English (US) |
---|---|
Pages (from-to) | gkw1353 |
Journal | Nucleic Acids Research |
Volume | 45 |
Issue number | 8 |
DOIs | |
State | Published - Jan 12 2017 |
Bibliographical note
KAUST Repository Item: Exported on 2020-10-01Acknowledged KAUST grant number(s): URF/1/1976-02, FCS/1/2448-01
Acknowledgements: King Abdullah University of Science and Technology (KAUST) Office of Sponsored Research (OSR) [URF/1/1976-02, FCS/1/2448-01]; Science Development Foundation under the President of the Republic of Azerbaijan [Grant EİF-2010-1(1)-40/27-3]. Funding for open access charge: King Abdullah University of Science and Technology (Awards No URF/1/1976-02 and FCS/1/2448-01).