Skip to main content

Leveraging Multiple Representations of Topic Models for Knowledge Discovery

Publication ,  Journal Article
Potts, CM; Savaliya, A; Jhala, A
Published in: IEEE Access
January 1, 2022

Topic models are often useful in categorization of related documents in information retrieval and knowledge discovery systems, especially for large datasets. Interpreting the output of these models remains an ongoing challenge for the research community. The typical practice in the application of topic models is to tune the parameters of a chosen model for a target dataset and select the model with the best output based on a given metric. We present a novel perspective on topic analysis by presenting a process for combining output from multiple models with different theoretical underpinnings. We show that this results in our ability to tackle novel tasks such as semantic characterization of content that cannot be carried out by using single models. One example task is to characterize the differences between topics or documents in terms of their purpose and also importance with respect to the underlying output of the discovery algorithm. To show the potential benefit of leveraging multiple models we present an algorithm to map the term-space of Latent Dirichlet Allocation (LDA) to the neural document-embedding space of doc2vec. We also show that by utilizing both models in parallel and analyzing the resulting document distributions using the Normalized Pointwise Mutual Information (NPMI) metric we can gain insight into the purpose and importance of topics across models. This approach moves beyond topic identification to a richer characterization of the information and provides a better understanding of the complex relationships between these typically competing techniques.

Duke Scholars

Published In

IEEE Access

DOI

EISSN

2169-3536

Publication Date

January 1, 2022

Volume

10

Start / End Page

104696 / 104705

Related Subject Headings

  • 46 Information and computing sciences
  • 40 Engineering
  • 10 Technology
  • 09 Engineering
  • 08 Information and Computing Sciences
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Potts, C. M., Savaliya, A., & Jhala, A. (2022). Leveraging Multiple Representations of Topic Models for Knowledge Discovery. IEEE Access, 10, 104696–104705. https://doi.org/10.1109/ACCESS.2022.3210529
Potts, C. M., A. Savaliya, and A. Jhala. “Leveraging Multiple Representations of Topic Models for Knowledge Discovery.” IEEE Access 10 (January 1, 2022): 104696–705. https://doi.org/10.1109/ACCESS.2022.3210529.
Potts CM, Savaliya A, Jhala A. Leveraging Multiple Representations of Topic Models for Knowledge Discovery. IEEE Access. 2022 Jan 1;10:104696–705.
Potts, C. M., et al. “Leveraging Multiple Representations of Topic Models for Knowledge Discovery.” IEEE Access, vol. 10, Jan. 2022, pp. 104696–705. Scopus, doi:10.1109/ACCESS.2022.3210529.
Potts CM, Savaliya A, Jhala A. Leveraging Multiple Representations of Topic Models for Knowledge Discovery. IEEE Access. 2022 Jan 1;10:104696–104705.

Published In

IEEE Access

DOI

EISSN

2169-3536

Publication Date

January 1, 2022

Volume

10

Start / End Page

104696 / 104705

Related Subject Headings

  • 46 Information and computing sciences
  • 40 Engineering
  • 10 Technology
  • 09 Engineering
  • 08 Information and Computing Sciences