Scholars@Duke publication: Augmented Latent Dirichlet Allocation (Lda) Topic Model with Gaussian Mixture Topics

Augmented Latent Dirichlet Allocation (Lda) Topic Model with Gaussian Mixture Topics

Publication , Conference

Prabhudesai, KS; Mainsah, BO; Collins, LM; Throckmorton, CS

Published in: ICASSP IEEE International Conference on Acoustics Speech and Signal Processing Proceedings

September 10, 2018

Latent Dirichlet allocation (LDA) is a statistical model that is often used to discover topics or themes in a large collection of documents. In the LDA model, topics are modeled as discrete distributions over a finite vocabulary of words. The LDA is also a popular choice to model other datasets spanning a discrete domain, such as population genetics and social networks. However, in order to model data spanning a continuous domain with the LDA, discrete approximations of the data need to be made. These discrete approximations to continuous data can lead to loss of information and may not represent the true structure of the underlying data. We present an augmented version of the LDA topic model, where topics are represented using Gaussian mixture models (GMMs), which are multi-modal distributions spanning a continuous domain. This augmentation of the LDA topic model with Gaussian mixture topics is denoted by the GMM-LDA model. We use Gibbs sampling to infer model parameters. We demonstrate the utility of the GMM-LDA model by applying it to the problem of clustering sleep states in electroencephalography (EEG) data. Results are presented demonstrating superior clustering performance with our GMM-LDA algorithm compared to the standard LDA and other clustering algorithms.

Duke Scholars

Author Boyla Octavie Mainsah Electrical and Computer Engineering

Author Leslie M. Collins Electrical and Computer Engineering

Published In

ICASSP IEEE International Conference on Acoustics Speech and Signal Processing Proceedings

DOI

10.1109/ICASSP.2018.8462003

ISSN

1520-6149

Publication Date

September 10, 2018

Volume

2018-April

Start / End Page

2451 / 2455

Citation

APA

Chicago

ICMJE

MLA

NLM

Prabhudesai, K. S., Mainsah, B. O., Collins, L. M., & Throckmorton, C. S. (2018). Augmented Latent Dirichlet Allocation (Lda) Topic Model with Gaussian Mixture Topics. In ICASSP IEEE International Conference on Acoustics Speech and Signal Processing Proceedings (Vol. 2018-April, pp. 2451–2455). https://doi.org/10.1109/ICASSP.2018.8462003

Prabhudesai, K. S., B. O. Mainsah, L. M. Collins, and C. S. Throckmorton. “Augmented Latent Dirichlet Allocation (Lda) Topic Model with Gaussian Mixture Topics.” In ICASSP IEEE International Conference on Acoustics Speech and Signal Processing Proceedings, 2018-April:2451–55, 2018. https://doi.org/10.1109/ICASSP.2018.8462003.

Prabhudesai KS, Mainsah BO, Collins LM, Throckmorton CS. Augmented Latent Dirichlet Allocation (Lda) Topic Model with Gaussian Mixture Topics. In: ICASSP IEEE International Conference on Acoustics Speech and Signal Processing Proceedings. 2018. p. 2451–5.

Prabhudesai, K. S., et al. “Augmented Latent Dirichlet Allocation (Lda) Topic Model with Gaussian Mixture Topics.” ICASSP IEEE International Conference on Acoustics Speech and Signal Processing Proceedings, vol. 2018-April, 2018, pp. 2451–55. Scopus, doi:10.1109/ICASSP.2018.8462003.

Prabhudesai KS, Mainsah BO, Collins LM, Throckmorton CS. Augmented Latent Dirichlet Allocation (Lda) Topic Model with Gaussian Mixture Topics. ICASSP IEEE International Conference on Acoustics Speech and Signal Processing Proceedings. 2018. p. 2451–2455.

Published In

ICASSP IEEE International Conference on Acoustics Speech and Signal Processing Proceedings

DOI

10.1109/ICASSP.2018.8462003

ISSN

1520-6149

Publication Date

September 10, 2018

Volume

2018-April

Start / End Page

2451 / 2455