Abstract
Genome-wide association studies have been successful in identifying single nucleotide polymorphisms (SNPs) associated with a large number of phenotypes. However, an associated SNP is likely part of a larger region of linkage disequilibrium. This makes it difficult to precisely identify the SNPs that have a biological link with the phenotype. We have systematically investigated the association of multiple types of ENCODE data with disease-associated SNPs and show that there is significant enrichment for functional SNPs among the currently identified associations. This enrichment is strongest when integrating multiple sources of functional information and when highest confidence disease-associated SNPs are used. We propose an approach that integrates multiple types of functional data generated by the ENCODE Consortium to help identify "functional SNPs" that may be associated with the disease phenotype. Our approach generates putative functional annotations for up to 80% of all previously reported associations. We show that for most associations, the functional SNP most strongly supported by experimental evidence is a SNP in linkage disequilibrium with the reported association rather than the reported SNP itself. Our results show that the experimental data sets generated by the ENCODE Consortium can be successfully used to suggest functional hypotheses for variants associated with diseases and other phenotypes.
Original language | English (US) |
---|---|
Pages (from-to) | 1748-1759 |
Number of pages | 12 |
Journal | Genome Research |
Volume | 22 |
Issue number | 9 |
DOIs | |
State | Published - Sep 5 2012 |
Externally published | Yes |
Bibliographical note
KAUST Repository Item: Exported on 2020-10-01Acknowledgements: We thank Ross Hardison, Ewan Birney, Jason Ernst, KonradKarczewski, Manoj Hariharan, and the members of the Batzogloulaboratory for suggestions and comments. We thank the anonymousreviewers for valuable feedback and suggestions. We thankthe ENCODE Consortium, the Office of Population Genomics atthe National Human Genome Research Institute, the HapMapConsortium, and the Genome Bioinformatics Group at the Universityof California–Santa Cruz for generating the data and toolsused in this work. This work was supported in part by the ENCODEConsortium under Grant No. NIH 5U54 HG 004558, by theNational Science Foundation under Grant No. 0640211, fundingfrom the Beta Cell Consortium, and by a King Abdullah Universityof Science and Technology research grant. M.A.S. was supportedin part by a Richard and Naomi Horowitz Stanford GraduateFellowship. A.K. was partially supported by an ENCODE analysissubcontract.
This publication acknowledges KAUST support, but has no KAUST affiliated authors.