TY - JOUR
T1 - HOCOMOCO: expansion and enhancement of the collection of transcription factor binding sites models
AU - Kulakovskiy, Ivan V.
AU - Vorontsov, Ilya E.
AU - Yevshin, Ivan S.
AU - Soboleva, Anastasiia V.
AU - Kasianov, Artem S.
AU - Ashoor, Haitham
AU - Ba Alawi, Wail
AU - Bajic, Vladimir B.
AU - Medvedeva, Yulia A.
AU - Kolpakov, Fedor A.
AU - Makeev, Vsevolod J.
N1 - KAUST Repository Item: Exported on 2020-10-01
Acknowledgements: We thank Evolutionary Genomics Laboratory, Faculty
of Bioengineering and Bioinformatics (M.V. Lomonosov
Moscow State University) and personally Prof. A.S. Kondrashov
for computational facilities.
PY - 2015/11/19
Y1 - 2015/11/19
N2 - Models of transcription factor (TF) binding sites provide a basis for a wide spectrum of studies in regulatory genomics, from reconstruction of regulatory networks to functional annotation of transcripts and sequence variants. While TFs may recognize different sequence patterns in different conditions, it is pragmatic to have a single generic model for each particular TF as a baseline for practical applications. Here we present the expanded and enhanced version of HOCOMOCO (http://hocomoco.autosome.ru and http://www.cbrc.kaust.edu.sa/hocomoco10), the collection of models of DNA patterns, recognized by transcription factors. HOCOMOCO now provides position weight matrix (PWM) models for binding sites of 601 human TFs and, in addition, PWMs for 396 mouse TFs. Furthermore, we introduce the largest up to date collection of dinucleotide PWM models for 86 (52) human (mouse) TFs. The update is based on the analysis of massive ChIP-Seq and HT-SELEX datasets, with the validation of the resulting models on in vivo data. To facilitate a practical application, all HOCOMOCO models are linked to gene and protein databases (Entrez Gene, HGNC, UniProt) and accompanied by precomputed score thresholds. Finally, we provide command-line tools for PWM and diPWM threshold estimation and motif finding in nucleotide sequences.
AB - Models of transcription factor (TF) binding sites provide a basis for a wide spectrum of studies in regulatory genomics, from reconstruction of regulatory networks to functional annotation of transcripts and sequence variants. While TFs may recognize different sequence patterns in different conditions, it is pragmatic to have a single generic model for each particular TF as a baseline for practical applications. Here we present the expanded and enhanced version of HOCOMOCO (http://hocomoco.autosome.ru and http://www.cbrc.kaust.edu.sa/hocomoco10), the collection of models of DNA patterns, recognized by transcription factors. HOCOMOCO now provides position weight matrix (PWM) models for binding sites of 601 human TFs and, in addition, PWMs for 396 mouse TFs. Furthermore, we introduce the largest up to date collection of dinucleotide PWM models for 86 (52) human (mouse) TFs. The update is based on the analysis of massive ChIP-Seq and HT-SELEX datasets, with the validation of the resulting models on in vivo data. To facilitate a practical application, all HOCOMOCO models are linked to gene and protein databases (Entrez Gene, HGNC, UniProt) and accompanied by precomputed score thresholds. Finally, we provide command-line tools for PWM and diPWM threshold estimation and motif finding in nucleotide sequences.
UR - http://hdl.handle.net/10754/613302
UR - http://nar.oxfordjournals.org/lookup/doi/10.1093/nar/gkv1249
UR - http://www.scopus.com/inward/record.url?scp=84976873260&partnerID=8YFLogxK
U2 - 10.1093/nar/gkv1249
DO - 10.1093/nar/gkv1249
M3 - Article
C2 - 26586801
SN - 0305-1048
VL - 44
SP - D116-D125
JO - Nucleic Acids Research
JF - Nucleic Acids Research
IS - D1
ER -