Approximating The Dirichlet Compound Multinomial Distribution |
Rasmus Elsborg Madsen, David Kauchak, Charles Elkan
|
Abstract | We investigate the Dirichlet compound multinomial (DCM), which has
recently been shown to be a good model for word burstiness in documents.
We provide a number of conceptual explanations that account for
these recent results. We then derive an exponential family approximation
of the DCM that is substantially faster to train, while still producing
similar probabilities and classification performance. We also investigate
Fisher kernels using the DCM model for generating distributionally
based similarity scores. Initial experiments show promise for this type of
similarity method. |
Keywords | DCM, Dirichlet, Polya, Text mining |
Type | Conference paper [Submitted] |
Conference | Neural Information Processing Systems |
Year | 2005 Month December |
BibTeX data | [bibtex] |
IMM Group(s) | Intelligent Signal Processing |