TY - JOUR
T1 - A Sequential Algorithm for Fast Fitting of Dirichlet Process Mixture Models
AU - Zhang, Xiaole
AU - Nott, David J.
AU - Yau, Christopher
AU - Jasra, Ajay
N1 - Generated from Scopus record by KAUST IRTS on 2019-11-20
PY - 2014/1/1
Y1 - 2014/1/1
N2 - In this article, we propose an improvement on the sequential updating and greedy search (SUGS) algorithm for fast fitting of Dirichlet process mixture models. The SUGS algorithm provides a means for very fast approximate Bayesian inference for mixture data which is particularly of use when datasets are so large that many standard Markov chain Monte Carlo (MCMC) algorithms cannot be applied efficiently, or take a prohibitively long time to converge. In particular, these ideas are used to initially interrogate the data, and to refine models such that one can potentially apply exact data analysis later on. SUGS relies upon sequentially allocating data to clusters and proceeding with an update of the posterior on the subsequent allocations and parameters which assumes this allocation is correct. Our modification softens this approach, by providing a probability distribution over allocations, with a similar computational cost; this approach has an interpretation as a variational Bayes procedure and hence we term it variational SUGS (VSUGS). It is shown in simulated examples that VSUGS can outperform, in terms of density estimation and classification, a version of the SUGS algorithm in many scenarios. In addition, we present a data analysis for flow cytometry data, and SNP data via a three-class Dirichlet process mixture model, illustrating the apparent improvement over the original SUGS algorithm.
AB - In this article, we propose an improvement on the sequential updating and greedy search (SUGS) algorithm for fast fitting of Dirichlet process mixture models. The SUGS algorithm provides a means for very fast approximate Bayesian inference for mixture data which is particularly of use when datasets are so large that many standard Markov chain Monte Carlo (MCMC) algorithms cannot be applied efficiently, or take a prohibitively long time to converge. In particular, these ideas are used to initially interrogate the data, and to refine models such that one can potentially apply exact data analysis later on. SUGS relies upon sequentially allocating data to clusters and proceeding with an update of the posterior on the subsequent allocations and parameters which assumes this allocation is correct. Our modification softens this approach, by providing a probability distribution over allocations, with a similar computational cost; this approach has an interpretation as a variational Bayes procedure and hence we term it variational SUGS (VSUGS). It is shown in simulated examples that VSUGS can outperform, in terms of density estimation and classification, a version of the SUGS algorithm in many scenarios. In addition, we present a data analysis for flow cytometry data, and SNP data via a three-class Dirichlet process mixture model, illustrating the apparent improvement over the original SUGS algorithm.
UR - http://www.tandfonline.com/doi/abs/10.1080/10618600.2013.870906
UR - http://www.scopus.com/inward/record.url?scp=84908074448&partnerID=8YFLogxK
U2 - 10.1080/10618600.2013.870906
DO - 10.1080/10618600.2013.870906
M3 - Article
SN - 1537-2715
VL - 23
JO - Journal of Computational and Graphical Statistics
JF - Journal of Computational and Graphical Statistics
IS - 4
ER -