Augmented Latent Dirichlet Allocation (Lda) Topic Model with Gaussian Mixture Topics

Conference Paper

Latent Dirichlet allocation (LDA) is a statistical model that is often used to discover topics or themes in a large collection of documents. In the LDA model, topics are modeled as discrete distributions over a finite vocabulary of words. The LDA is also a popular choice to model other datasets spanning a discrete domain, such as population genetics and social networks. However, in order to model data spanning a continuous domain with the LDA, discrete approximations of the data need to be made. These discrete approximations to continuous data can lead to loss of information and may not represent the true structure of the underlying data. We present an augmented version of the LDA topic model, where topics are represented using Gaussian mixture models (GMMs), which are multi-modal distributions spanning a continuous domain. This augmentation of the LDA topic model with Gaussian mixture topics is denoted by the GMM-LDA model. We use Gibbs sampling to infer model parameters. We demonstrate the utility of the GMM-LDA model by applying it to the problem of clustering sleep states in electroencephalography (EEG) data. Results are presented demonstrating superior clustering performance with our GMM-LDA algorithm compared to the standard LDA and other clustering algorithms.

Full Text

Duke Authors

Cited Authors

  • Prabhudesai, KS; Mainsah, BO; Collins, LM; Throckmorton, CS

Published Date

  • September 10, 2018

Published In

Volume / Issue

  • 2018-April /

Start / End Page

  • 2451 - 2455

International Standard Serial Number (ISSN)

  • 1520-6149

International Standard Book Number 13 (ISBN-13)

  • 9781538646588

Digital Object Identifier (DOI)

  • 10.1109/ICASSP.2018.8462003

Citation Source

  • Scopus