TY - GEN
T1 - The contextual focused topic model
AU - Chen, Xu
AU - Zhou, Mingyuan
AU - Carin, Lawrence
N1 - Generated from Scopus record by KAUST IRTS on 2021-02-09
PY - 2012/9/14
Y1 - 2012/9/14
N2 - A nonparametric Bayesian contextual focused topic model (cFTM) is proposed. The cFTM infers a sparse ("focused") set of topics for each document, while also leveraging contextual information about the author(s) and document venue. The hierarchical beta process, coupled with a Bernoulli process, is employed to infer the focused set of topics associated with each author and venue; the same construction is also employed to infer those topics associated with a given document that are unusual (termed "random effects"), relative to topics that are inferred as probable for the associated author(s) and venue. To leverage statistical strength and infer latent interrelationships between authors and venues, the Dirichlet process is utilized to cluster authors and venues. The cFTM automatically infers the number of topics needed to represent the corpus, the number of author and venue clusters, and the probabilistic importance of the author, venue and random-effect information on word assignment for a given document. Efficient MCMC inference is presented. Example results and interpretations are presented for two real datasets, demonstrating promising performance, with comparison to other state-of-the-art methods. © 2012 ACM.
AB - A nonparametric Bayesian contextual focused topic model (cFTM) is proposed. The cFTM infers a sparse ("focused") set of topics for each document, while also leveraging contextual information about the author(s) and document venue. The hierarchical beta process, coupled with a Bernoulli process, is employed to infer the focused set of topics associated with each author and venue; the same construction is also employed to infer those topics associated with a given document that are unusual (termed "random effects"), relative to topics that are inferred as probable for the associated author(s) and venue. To leverage statistical strength and infer latent interrelationships between authors and venues, the Dirichlet process is utilized to cluster authors and venues. The cFTM automatically infers the number of topics needed to represent the corpus, the number of author and venue clusters, and the probabilistic importance of the author, venue and random-effect information on word assignment for a given document. Efficient MCMC inference is presented. Example results and interpretations are presented for two real datasets, demonstrating promising performance, with comparison to other state-of-the-art methods. © 2012 ACM.
UR - http://dl.acm.org/citation.cfm?doid=2339530.2339549
UR - http://www.scopus.com/inward/record.url?scp=84866008256&partnerID=8YFLogxK
U2 - 10.1145/2339530.2339549
DO - 10.1145/2339530.2339549
M3 - Conference contribution
SN - 9781450314626
SP - 96
EP - 104
BT - Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
ER -