Skip to main content

Bayesian Simultaneous Edit and Imputation for Multivariate Categorical Data

Publication ,  Journal Article
Manrique-Vallier, D; Reiter, JP
Published in: Journal of the American Statistical Association
October 2, 2017

In categorical data, it is typically the case that some combinations of variables are theoretically impossible, such as a 3-year-old child who is married or a man who is pregnant. In practice, however, reported values often include such structural zeros due to, for example, respondent mistakes or data processing errors. To purge data of such errors, many statistical organizations use a process known as edit-imputation. The basic idea is first to select reported values to change according to some heuristic or loss function, and second to replace those values with plausible imputations. This two-stage process typically does not fully use information in the data when determining locations of errors, nor does it appropriately reflect uncertainty resulting from the edits and imputations. We present an alternative approach to editing and imputation for categorical microdata with structural zeros that addresses these shortcomings. Specifically, we use a Bayesian hierarchical model that couples a stochastic model for the measurement error process with a Dirichlet process mixture of multinomial distributions for the underlying, error-free values. The latter model is restricted to have support only on the set of theoretically possible combinations. We illustrate this integrated approach to editing and imputation using simulation studies with data from the 2000 U. S. census, and compare it to a two-stage edit-imputation routine. Supplementary material is available online.

Duke Scholars

Altmetric Attention Stats
Dimensions Citation Stats

Published In

Journal of the American Statistical Association

DOI

EISSN

1537-274X

ISSN

0162-1459

Publication Date

October 2, 2017

Volume

112

Issue

520

Start / End Page

1708 / 1719

Related Subject Headings

  • Statistics & Probability
  • 4905 Statistics
  • 3802 Econometrics
  • 1603 Demography
  • 1403 Econometrics
  • 0104 Statistics
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Manrique-Vallier, D., & Reiter, J. P. (2017). Bayesian Simultaneous Edit and Imputation for Multivariate Categorical Data. Journal of the American Statistical Association, 112(520), 1708–1719. https://doi.org/10.1080/01621459.2016.1231612
Manrique-Vallier, D., and J. P. Reiter. “Bayesian Simultaneous Edit and Imputation for Multivariate Categorical Data.” Journal of the American Statistical Association 112, no. 520 (October 2, 2017): 1708–19. https://doi.org/10.1080/01621459.2016.1231612.
Manrique-Vallier D, Reiter JP. Bayesian Simultaneous Edit and Imputation for Multivariate Categorical Data. Journal of the American Statistical Association. 2017 Oct 2;112(520):1708–19.
Manrique-Vallier, D., and J. P. Reiter. “Bayesian Simultaneous Edit and Imputation for Multivariate Categorical Data.” Journal of the American Statistical Association, vol. 112, no. 520, Oct. 2017, pp. 1708–19. Scopus, doi:10.1080/01621459.2016.1231612.
Manrique-Vallier D, Reiter JP. Bayesian Simultaneous Edit and Imputation for Multivariate Categorical Data. Journal of the American Statistical Association. 2017 Oct 2;112(520):1708–1719.

Published In

Journal of the American Statistical Association

DOI

EISSN

1537-274X

ISSN

0162-1459

Publication Date

October 2, 2017

Volume

112

Issue

520

Start / End Page

1708 / 1719

Related Subject Headings

  • Statistics & Probability
  • 4905 Statistics
  • 3802 Econometrics
  • 1603 Demography
  • 1403 Econometrics
  • 0104 Statistics