Skip to main content
Journal cover image

Disclosure risk evaluation for fully synthetic categorical data

Publication ,  Conference
Hu, J; Reiter, JP; Wang, Q
Published in: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
January 1, 2014

We present an approach for evaluating disclosure risks for fully synthetic categorical data. The basic idea is to compute probability distributions of unknown confidential data values given the synthetic data and assumptions about intruder knowledge. We use a “worst-case” scenario of an intruder knowing all but one of the records in the confidential data. To create the synthetic data, we use a Dirichlet process mixture of products of multinomial distributions, which is a Bayesian version of a latent class model. In addition to generating synthetic data with high utility, the likelihood function admits simple and convenient approximations to the disclosure risk probabilities via importance sampling. We illustrate the disclosure risk computations by synthesizing a subset of data from the American Community Survey.

Duke Scholars

Altmetric Attention Stats
Dimensions Citation Stats

Published In

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

DOI

EISSN

1611-3349

ISSN

0302-9743

ISBN

9783319112565

Publication Date

January 1, 2014

Volume

8744

Start / End Page

185 / 199

Related Subject Headings

  • Artificial Intelligence & Image Processing
  • 46 Information and computing sciences
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Hu, J., Reiter, J. P., & Wang, Q. (2014). Disclosure risk evaluation for fully synthetic categorical data. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 8744, pp. 185–199). https://doi.org/10.1007/978-3-319-11257-2_15
Hu, J., J. P. Reiter, and Q. Wang. “Disclosure risk evaluation for fully synthetic categorical data.” In Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 8744:185–99, 2014. https://doi.org/10.1007/978-3-319-11257-2_15.
Hu J, Reiter JP, Wang Q. Disclosure risk evaluation for fully synthetic categorical data. In: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). 2014. p. 185–99.
Hu, J., et al. “Disclosure risk evaluation for fully synthetic categorical data.” Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 8744, 2014, pp. 185–99. Scopus, doi:10.1007/978-3-319-11257-2_15.
Hu J, Reiter JP, Wang Q. Disclosure risk evaluation for fully synthetic categorical data. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). 2014. p. 185–199.
Journal cover image

Published In

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

DOI

EISSN

1611-3349

ISSN

0302-9743

ISBN

9783319112565

Publication Date

January 1, 2014

Volume

8744

Start / End Page

185 / 199

Related Subject Headings

  • Artificial Intelligence & Image Processing
  • 46 Information and computing sciences