Approximating The Dirichlet Compound Multinomial Distribution

Rasmus Elsborg Madsen, David Kauchak, Charles Elkan

AbstractWe investigate the Dirichlet compound multinomial (DCM), which has
recently been shown to be a good model for word burstiness in documents.
We provide a number of conceptual explanations that account for
these recent results. We then derive an exponential family approximation
of the DCM that is substantially faster to train, while still producing
similar probabilities and classification performance. We also investigate
Fisher kernels using the DCM model for generating distributionally
based similarity scores. Initial experiments show promise for this type of
similarity method.
KeywordsDCM, Dirichlet, Polya, Text mining
TypeConference paper [Submitted]
ConferenceNeural Information Processing Systems
Year2005    Month December
BibTeX data [bibtex]
IMM Group(s)Intelligent Signal Processing