Abstract
Mapping the DNA-binding preferences of transcription factor (TF) complexes is critical for deciphering the functions of cis-regulatory elements. Here, we developed a computational method that compares co-occurring motif spacings in conserved versus unconserved regions of the human genome to detect evolutionarily constrained binding sites of rigid TF complexes. Structural data were used to estimate TF complex physical plausibility, explore overlapping motif arrangements seldom tackled by non-structure-aware methods, and generate and analyse three-dimensional models of the predicted complexes bound to DNA. Using this approach, we predicted 422 physically realistic TF complex motifs at 18% false discovery rate, the majority of which (326, 77%) contain some sequence overlap between binding sites. The set of mostly novel complexes is enriched in known composite motifs, predictive of binding site configurations in TF-TF-DNA crystal structures, and supported by ChIP-seq datasets. Structural modelling revealed three cooperativity mechanisms: direct protein-protein interactions, potentially indirect interactions and 'through-DNA' interactions. Indeed, 38% of the predicted complexes were found to contain four or more bases in which TF pairs appear to synergize through overlapping binding to the same DNA base pairs in opposite grooves or strands. Our TF complex and associated binding site predictions are available as a web resource at http://bejerano.stanford.edu/complex.
Original language | English (US) |
---|---|
Pages (from-to) | 20130029 |
Journal | Philosophical Transactions of the Royal Society B: Biological Sciences |
Volume | 368 |
Issue number | 1632 |
DOIs | |
State | Published - Nov 11 2013 |
Externally published | Yes |
Bibliographical note
KAUST Repository Item: Exported on 2020-10-01Acknowledgements: H.G. is supported by National Science Foundation Fellowship DGE-1147470. A.C.D. is supported by a Natural Sciences and Engineering Research Council of Canada Postdoctoral Fellowship (PDF). A.M.W. is supported by a Bio-X Stanford Interdisciplinary Graduate Fellowship. G.B. is supported by NIH grants R01HG005058 and R01HD059862 and KAUST. G.B. is a Packard Fellow and Microsoft Research Fellow. The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript.
This publication acknowledges KAUST support, but has no KAUST affiliated authors.