Author and Topic Modelling in Text Data |
|
Abstract | This thesis deals with probabilistic modelling of authors, documents, and topics in textual data. The focus is on the Latent Dirichlet Allocation (LDA) model and the Author-Topic (AT) model where Gibbs sampling is used for inferring model parameters from data. Furthermore, a method for optimising hyper parameters in an ML-II setting is described.
Model properties are discussed in connection with applications of the models which include detection of unlikely documents among scientific papers from the NIPS conferences using document perplexity, and the problem of link prediction in the online social network Twitter for which the results are reported as Area Under the ROC curve (AUC) and compared to well known graph-based methods. |
Type | Master's thesis [Academic thesis] |
Year | 2012 |
Publisher | Technical University of Denmark, DTU Informatics, E-mail: reception@imm.dtu.dk |
Address | Asmussens Alle, Building 305, DK-2800 Kgs. Lyngby, Denmark |
Series | IMM-M.Sc.-2012-67 |
Note | Supervised by Professor Lars Kai Hansen, lkh@imm.dtu.dk, DTU Informatics |
Electronic version(s) | [pdf] |
Publication link | http://www.imm.dtu.dk/English.aspx |
BibTeX data | [bibtex] |
IMM Group(s) | Intelligent Signal Processing |