Sequence alignment kernel for recognition of promoter regions

Leo Gordon*, Alexey Ya Chervonenkis, Alex J. Gammerman, Ilham A. Shahmuradov, Victor V. Solovyev

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

109 Scopus citations


In this paper we propose a new method for recognition of prokaryotic promoter regions with startpoints of transcription. The method is based on Sequence Alignment Kernel, a function reflecting the quantitative measure of match between two sequences. This kernel function is further used in Dual SVM, which performs the recognition. Several recognition methods have been trained and tested on positive data set, consisting of 669 σ70-promoter regions with known transcription startpoints of Escherichia coli and two negative data sets of 709 examples each, taken from coding and non-coding regions of the same genome. The results show that our method performs well and achieves 16.5% average error rate on positive & coding negative data and 18.6% average error rate on positive & non-coding negative data.

Original languageEnglish (US)
Pages (from-to)1964-1971
Number of pages8
Issue number15
StatePublished - Oct 12 2003
Externally publishedYes

Bibliographical note

Funding Information:
The authors are grateful to Heladia Salgado, member of RegulonDB (Salgado et al., 2000 Computational_Genomics/regulondb/) database support group and Jose Carlos Gonzalez of Universidad Politec-nica de Madrid (UPM) for useful discussions and Ruti Hershberg, member of PromEC (Hershberg et al., 2001, database support group for the data for the experiments and other useful information. We also wish to thank Spanish Ministerio de Educacion, Culltura y Deporte and School of Telecommunications, UPM, for their support through grant no. SAB2001-0057. We also wish to thank the anonymous referees for their valuable comments that helped us to improve the paper.

Funding Information:
This work is supported by BBSRC grant no.111/BIO14428, ‘Pattern recognition techniques for gene identification in plant genomic sequences’ and EPSRC grant GR/M14937, ‘Predictive complexity: recursion-theoretic variants’.

ASJC Scopus subject areas

  • Statistics and Probability
  • Biochemistry
  • Molecular Biology
  • Computer Science Applications
  • Computational Theory and Mathematics
  • Computational Mathematics


Dive into the research topics of 'Sequence alignment kernel for recognition of promoter regions'. Together they form a unique fingerprint.

Cite this