@MASTERSTHESIS\{IMM2017-06971, author = "O. Mortensen", title = "The Author-Topic Model", year = "2017", school = "Technical University of Denmark, Department of Applied Mathematics and Computer Science", address = "Richard Petersens Plads, Building 324, {DK-}2800 Kgs. Lyngby, Denmark, compute@compute.dtu.dk", type = "", note = "Supervised by Ole Winther, olwi@dtu.dk, {DTU} Compute", url = "http://www.compute.dtu.dk/English.aspx", abstract = "The goal of this thesis is to develop a scalable and user-friendly implementation of the author-topic model for the Gensim framework. To this end, a variational Bayes (VB) algorithm is developed to train the model. In order to allow online training, stochastic variational inference is applied. This removes the need to store all documents in memory, and allows us to keep learning on new data. Maximum Likelihood Estimation is applied to automatically learn the optimal hyperparameters of the priors over words and over topics. A blocking {VB} method is applied, inspired by blocking Gibbs sampling, that relaxes the assumptions we make about the form of the posterior in the variational approximation. The resulting algorithm lends itself to optimizations that decrease the memory complexity of the algorithm, and speed up training by vectorizing the {VB} updates. Blocking {VB} also increases the variational lower bound more per iteration than standard {VB}. In order to illustrate useful examples of the model, as well as demonstrate usage of the software, a tutorial is written and is accessible online. This tutorial uses data exploration and similarity queries to gain insight about authors in a dataset consisting of scientific papers from the {NIPS} conference." }