Skip to main content

Dirichlet process mixture models for modeling and generating synthetic versions of nested categorical data

Publication ,  Journal Article
Hu, J; Reiter, JP; Wang, Q
Published in: Bayesian Analysis
January 1, 2018

We present a Bayesian model for estimating the joint distribution of multivariate categorical data when units are nested within groups. Such data arise frequently in social science settings, for example, people living in households. The model assumes that (i) each group is a member of a group-level latent class, and (ii) each unit is a member of a unit-level latent class nested within its grouplevel latent class. This structure allows the model to capture dependence among units in the same group. It also facilitates simultaneous modeling of variables at both group and unit levels. We develop a version of the model that assigns zero probability to groups and units with physically impossible combinations of variables. We apply the model to estimate multivariate relationships in a subset of the American Community Survey. Using the estimated model, we generate synthetic household data that could be disseminated as redacted public use files. Supplementary materials (Hu et al., 2017) for this article are available online.

Duke Scholars

Published In

Bayesian Analysis

DOI

EISSN

1931-6690

ISSN

1936-0975

Publication Date

January 1, 2018

Volume

13

Issue

1

Start / End Page

183 / 200

Related Subject Headings

  • Statistics & Probability
  • 4905 Statistics
  • 0104 Statistics
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Hu, J., Reiter, J. P., & Wang, Q. (2018). Dirichlet process mixture models for modeling and generating synthetic versions of nested categorical data. Bayesian Analysis, 13(1), 183–200. https://doi.org/10.1214/16-BA1047
Hu, J., J. P. Reiter, and Q. Wang. “Dirichlet process mixture models for modeling and generating synthetic versions of nested categorical data.” Bayesian Analysis 13, no. 1 (January 1, 2018): 183–200. https://doi.org/10.1214/16-BA1047.
Hu J, Reiter JP, Wang Q. Dirichlet process mixture models for modeling and generating synthetic versions of nested categorical data. Bayesian Analysis. 2018 Jan 1;13(1):183–200.
Hu, J., et al. “Dirichlet process mixture models for modeling and generating synthetic versions of nested categorical data.” Bayesian Analysis, vol. 13, no. 1, Jan. 2018, pp. 183–200. Scopus, doi:10.1214/16-BA1047.
Hu J, Reiter JP, Wang Q. Dirichlet process mixture models for modeling and generating synthetic versions of nested categorical data. Bayesian Analysis. 2018 Jan 1;13(1):183–200.

Published In

Bayesian Analysis

DOI

EISSN

1931-6690

ISSN

1936-0975

Publication Date

January 1, 2018

Volume

13

Issue

1

Start / End Page

183 / 200

Related Subject Headings

  • Statistics & Probability
  • 4905 Statistics
  • 0104 Statistics