Abstract
A new hierarchical tree-based topic model is developed, based on nonparametric Bayesian techniques. The model has two unique attributes: (i) a child node in the tree may have more than one parent, with the goal of eliminating redundant sub-topics deep in the tree; and (ii) parsimonious sub-topics are manifested, by removing redundant usage of words at multiple scales. The depth and width of the tree are unbounded within the prior, with a retrospective sampler employed to adaptively infer the appropriate tree size based upon the corpus under study. Excellent quantitative results are manifested on five standard data sets, and the inferred tree structure is also found to be highly interpretable. Copyright 2011 by the author(s)/owner(s).
Original language | English (US) |
---|---|
Title of host publication | Proceedings of the 28th International Conference on Machine Learning, ICML 2011 |
Pages | 377-384 |
Number of pages | 8 |
State | Published - Oct 7 2011 |
Externally published | Yes |