Abstract
A new approach is proposed for topic modeling, in which the latent matrix factorization employs Gaussian priors, rather than the Dirichlet-class priors widely used in such models. The use of a latent-Gaussian model permits simple and efficient approximate Bayesian posterior inference, via the Laplace approximation. On multiple datasets, the proposed approach is demonstrated to yield results as accurate as state-of-the-art approaches based on Dirichlet constructions, at a small fraction of the computation. The framework is general enough to jointly model text and binary data, here demonstrated to produce accurate and fast results for joint analysis of voting rolls and the associated legislative text. Further, it is demonstrated how the technique may be scaled up to massive data, with encouraging performance relative to alternative methods.
Original language | English (US) |
---|---|
Title of host publication | Journal of Machine Learning Research |
Publisher | Microtome [email protected] |
Pages | 393-401 |
Number of pages | 9 |
State | Published - Jan 1 2014 |
Externally published | Yes |