Scholars@Duke publication: Disclosure risk evaluation for fully synthetic categorical data

Disclosure risk evaluation for fully synthetic categorical data

Publication , Journal Article

Hu, J; Reiter, JP; Wang, Q

Published in: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

January 1, 2014

© Springer International Publishing Switzerland 2014. We present an approach for evaluating disclosure risks for fully synthetic categorical data. The basic idea is to compute probability distributions of unknown confidential data values given the synthetic data and assumptions about intruder knowledge. We use a “worst-case” scenario of an intruder knowing all but one of the records in the confidential data. To create the synthetic data, we use a Dirichlet process mixture of products of multinomial distributions, which is a Bayesian version of a latent class model. In addition to generating synthetic data with high utility, the likelihood function admits simple and convenient approximations to the disclosure risk probabilities via importance sampling. We illustrate the disclosure risk computations by synthesizing a subset of data from the American Community Survey.

Duke Scholars

Author Jerome P. Reiter Statistical Science

Published In

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

EISSN

1611-3349

ISSN

0302-9743

Publication Date

January 1, 2014

Volume

8744

Start / End Page

185 / 199

Related Subject Headings

Artificial Intelligence & Image Processing

Citation

APA

Chicago

ICMJE

MLA

NLM

Hu, J., Reiter, J. P., & Wang, Q. (2014). Disclosure risk evaluation for fully synthetic categorical data. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 8744, 185–199.

Hu, J., J. P. Reiter, and Q. Wang. “Disclosure risk evaluation for fully synthetic categorical data.” Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) 8744 (January 1, 2014): 185–99.

Hu J, Reiter JP, Wang Q. Disclosure risk evaluation for fully synthetic categorical data. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). 2014 Jan 1;8744:185–99.

Hu, J., et al. “Disclosure risk evaluation for fully synthetic categorical data.” Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 8744, Jan. 2014, pp. 185–99.

Published In

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

EISSN

1611-3349

ISSN

0302-9743

Publication Date

January 1, 2014

Volume

8744

Start / End Page

185 / 199

Related Subject Headings

Artificial Intelligence & Image Processing