The Author-Topic Model

Olavur Mortensen

AbstractThe goal of this thesis is to develop a scalable and user-friendly implementation of the author-topic model for the Gensim framework. To this end, a variational Bayes (VB) algorithm is developed to train the model.
In order to allow online training, stochastic variational inference is applied. This removes the need to store all documents in memory, and allows us to keep learning on new data.
Maximum Likelihood Estimation is applied to automatically learn the optimal hyperparameters of the priors over words and over topics.
A blocking VB method is applied, inspired by blocking Gibbs sampling, that relaxes the assumptions we make about the form of the posterior in the variational approximation. The resulting algorithm lends itself to optimizations that decrease the memory complexity of the algorithm, and speed up training by vectorizing the VB updates. Blocking VB also increases the variational lower bound more per iteration than standard VB.
In order to illustrate useful examples of the model, as well as demonstrate usage of the software, a tutorial is written and is accessible online. This tutorial uses data exploration and similarity queries to gain insight about authors in a dataset consisting of scientific papers from the NIPS conference.
TypeMaster's thesis [Academic thesis]
Year2017
PublisherTechnical University of Denmark, Department of Applied Mathematics and Computer Science
AddressRichard Petersens Plads, Building 324, DK-2800 Kgs. Lyngby, Denmark, compute@compute.dtu.dk
SeriesDTU Compute M.Sc.-2017
NoteSupervised by Ole Winther, olwi@dtu.dk, DTU Compute
Electronic version(s)[pdf]
Publication linkhttp://www.compute.dtu.dk/English.aspx
BibTeX data [bibtex]
IMM Group(s)Intelligent Signal Processing