Dna-prot: Identification of dna binding proteins from protein sequence information using random forest

K. Krishna Kumar, Ganesan Pugalenthi, P. N. Suganthan*

*Corresponding author for this work

    Research output: Contribution to journalArticlepeer-review

    105 Scopus citations


    DNA-binding proteins (DNABPs) are important for various cellular processes, such as transcriptional regulation, recombination, replication, repair, and DNA modification. So far various bioinformatics and machine learning techniques have been applied for identification of DNA-binding proteins from protein structure. Only few methods are available for the identification of DNA binding proteins from protein sequence. In this work, we report a random forest method, DNA-Prot, to identify DNA binding proteins from protein sequence. Training was performed on the dataset containing 146 DNA-binding proteins and 250 non DNA-binding proteins. The algorithm was tested on the dataset containing 92 DNA-binding proteins and 100 non DNA-binding proteins. We obtained 80.31% accuracy from training and 84.37% accuracy from testing. Benchmarking analysis on the independent of 823 DNA-binding proteins and 823 non DNA-binding proteins shows that our approach can distinguish DNA-binding proteins from non DNA-binding proteins with more than 80% accuracy. We also compared our method with DNAbinder method on test dataset and two independent datasets. Comparable performance was observed from both methods on test dataset. In the benchmark dataset containing 823 DNA-binding proteins and 823 non DNA-binding proteins, we obtained significantly better performance from DNA-Prot with 81.83% accuracy whereas DNAbinder achieved only 61.42% accuracy using amino acid composition and 63.5% using PSSM profile. Similarly, DNA-Prot achived better performance rate from the benchmark dataset containing 88 DNA-binding proteins and 233 non DNA-binding proteins. This result shows DNA-Prot can be efficiently used to identify DNA binding proteins from sequence information. The dataset and standalone version of DNA-Prot software can be obtained from http://www3.ntu.edu.sg/home/EPNSugan/index_files/dnaprot.htm.

    Original languageEnglish (US)
    Pages (from-to)679-686
    Number of pages8
    JournalJournal of Biomolecular Structure and Dynamics
    Issue number6
    StatePublished - Jun 2009

    ASJC Scopus subject areas

    • Structural Biology
    • Molecular Biology

    Cite this