An alternative prior process for nonparametric Bayesian clustering

Published

Journal Article

Prior distributions play a crucial role in Bayesian approaches to clustering. Two commonly-used prior distributions are the Dirichlet and Pitman-Yor processes. In this paper, we investigate the predictive probabilities that underlie these processes, and the implicit "rich-get-richer" characteristic of the resulting partitions. We explore an alternative prior for nonparametric Bayesian clustering-the uniform process-for applications where the "rich-get-richer" property is undesirable. We also explore the cost of this process: partitions are no longer ex-changeable with respect to the ordering of variables. We present new asymptotic and simulation-based results for the clustering characteristics of the uniform process and compare these with known results for the Dirichlet and Pitman-Yor processes. We compare performance on a real document clustering task, demonstrating the practical advantage of the uniform process despite its lack of exchangeability over orderings. Copyright 2010 by the authors.

Duke Authors

Cited Authors

  • Wallach, HM; Jensen, ST; Dicker, L; Heller, KA

Published Date

  • December 1, 2010

Published In

Volume / Issue

  • 9 /

Start / End Page

  • 892 - 899

Electronic International Standard Serial Number (EISSN)

  • 1533-7928

International Standard Serial Number (ISSN)

  • 1532-4435

Citation Source

  • Scopus