Abstract
Massively Parallel Signature Sequencing (MPSS) is a high-throughput, counting-based technology available for gene expression profiling. It produces output that is similar to Serial Analysis of Gene Expression and is ideal for building complex relational databases for gene expression. Our goal is to compare the in vivo global gene expression profiles of tissues infected with different strains of Salmonella obtained using the MPSS technology. In this article, we develop an exact ANOVA type model for this count data using a zero-inflatedPoisson distribution, different from existing methods that assume continuous densities. We adopt two Bayesian hierarchical models-one parametric and the other semiparametric with a Dirichlet process prior that has the ability to "borrow strength" across related signatures, where a signature is a specific arrangement of the nucleotides, usually 16-21 base pairs long. We utilize the discreteness of Dirichlet process prior to cluster signatures that exhibit similar differential expression profiles. Tests for differential expression are carried out using nonparametric approaches, while controlling the false discovery rate. We identify several differentially expressed genes that have important biological significance and conclude with a summary of the biological discoveries. This article has supplementary materials online. © 2010 American Statistical Association.
Original language | English (US) |
---|---|
Pages (from-to) | 956-967 |
Number of pages | 12 |
Journal | Journal of the American Statistical Association |
Volume | 105 |
Issue number | 491 |
DOIs | |
State | Published - Sep 2010 |
Externally published | Yes |
Bibliographical note
KAUST Repository Item: Exported on 2020-10-01Acknowledged KAUST grant number(s): KUS-CI-016-04
Acknowledgements: Soma S. Dhavala is a Doctoral Candiate, Department of Statistics, Texas A&M University, 3143 TAMU, College Station, TX 77843 (E-mail: [email protected]). Sujay Datta is Senior Scientist and Faculty Member, Statistical Center for HIV/AIDS Research and Prevention, Fred Hutchinson Cancer Research Center, M2-C125,1100 Fairview Avenue N., Seattle, WA 98109 (E-mail: [email protected]). Bani K. Mal lick is Professor, Department of Statistics, Texas A&M University, 3143 TAMU, College Station, TX 77843 (E-mail: [email protected]). Raymond J. Carroll is Distinguished Professor, Department of Statistics, Texas A&M University, 3143 TAMU, College Station, TX 77843 (E-mail: [email protected]). Sangeeta Khare is Research Assistant Professor, Department of Veterinary Pathobiology, Texas A&M University. 4467 TAMU, College Station, TX 77843 (E-mail: [email protected]). Sara D. Lawhon is Assistant Professor, Department of Veterinary Pathobiology, Texas A&M University, 4467 TAMU, College Station, TX 77843 (E-mail: [email protected]). L. Garry Adams is Professor. Department of Veterinary Pathobiology, Texas A&M University, 4467 TAMU, College Station, TX 77843 (E-mail: [email protected]). The research of Bani K. Mal lick and Raymond J. Carroll was supported by from the National Cancer Institute grants (CA 104620 and CA57030, respectively), National Science Foundation grant DMS 0914951. and by award KUS-CI-016-04, made by King Abdullah University of Science and Technology (KAUST). The research of Sujay Datta was supported by a postdoctoral training grant from the National Cancer Institute (CA90301). The research of L. Garry Adams was supported by the grants NIAID 1 RO1 A144170-01A1, USDA 2002-35204-12247, and NSF DMS 0914951. Public Health Service grant AI060933 supported the research of Sara D. Lawhon. The authors are greatful to Dr. David Dahl for discussions, and to the editors and the two anonymous referees for their suggestions and constructive comments.
This publication acknowledges KAUST support, but has no KAUST affiliated authors.