Abstract
We consider the problem of estimating the density of a random variable when precise measurements on the variable are not available, but replicated proxies contaminated with measurement error are available for sufficiently many subjects. Under the assumption of additive measurement errors this reduces to a problem of deconvolution of densities. Deconvolution methods often make restrictive and unrealistic assumptions about the density of interest and the distribution of measurement errors, e.g., normality and homoscedasticity and thus independence from the variable of interest. This article relaxes these assumptions and introduces novel Bayesian semiparametric methodology based on Dirichlet process mixture models for robust deconvolution of densities in the presence of conditionally heteroscedastic measurement errors. In particular, the models can adapt to asymmetry, heavy tails and multimodality. In simulation experiments, we show that our methods vastly outperform a recent Bayesian approach based on estimating the densities via mixtures of splines. We apply our methods to data from nutritional epidemiology. Even in the special case when the measurement errors are homoscedastic, our methodology is novel and dominates other methods that have been proposed previously. Additional simulation results, instructions on getting access to the data set and R programs implementing our methods are included as part of online supplemental materials.
Original language | English (US) |
---|---|
Pages (from-to) | 1101-1125 |
Number of pages | 25 |
Journal | Journal of Computational and Graphical Statistics |
Volume | 23 |
Issue number | 4 |
DOIs | |
State | Published - Oct 20 2014 |
Externally published | Yes |
Bibliographical note
KAUST Repository Item: Exported on 2020-10-01Acknowledged KAUST grant number(s): KUS-CI-016-04
Acknowledgements: Carroll's research was supported in part by grants R37-CA057030 and R25T-CA090301 from the National Cancer Institute. Mallick's research was supported in part by National Science Foundation grant DMS0914951. Staudenmayer's work was supported in part by NIH grants CA121005 and R01-HL099557. The authors thank Jeff Hart, John P. Buonaccorsi, and Susanne M. Schennach for their helpful suggestions. The authors also acknowledge the Texas A&M University Brazos HPC cluster that contributed to the research reported here. This publication is based in part on work supported by Award Number KUS-CI-016-04, made by King Abdullah University of Science and Technology (KAUST).
This publication acknowledges KAUST support, but has no KAUST affiliated authors.