Scholars@Duke publication: Disclosure risk evaluation for fully synthetic categorical data

Disclosure risk evaluation for fully synthetic categorical data

Publication , Conference

Hu, J; Reiter, JP; Wang, Q

Published in: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

January 1, 2014

Published version (DOI)

We present an approach for evaluating disclosure risks for fully synthetic categorical data. The basic idea is to compute probability distributions of unknown confidential data values given the synthetic data and assumptions about intruder knowledge. We use a “worst-case” scenario of an intruder knowing all but one of the records in the confidential data. To create the synthetic data, we use a Dirichlet process mixture of products of multinomial distributions, which is a Bayesian version of a latent class model. In addition to generating synthetic data with high utility, the likelihood function admits simple and convenient approximations to the disclosure risk probabilities via importance sampling. We illustrate the disclosure risk computations by synthesizing a subset of data from the American Community Survey.

Duke Scholars

Author Jerome P. Reiter Statistical Science

Altmetric Attention Stats

Dimensions Citation Stats

Published In

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

DOI

10.1007/978-3-319-11257-2_15

EISSN

1611-3349

ISSN

0302-9743

ISBN

9783319112565

Publication Date

January 1, 2014

Volume

8744

Start / End Page

185 / 199

Related Subject Headings

Artificial Intelligence & Image Processing
46 Information and computing sciences

Citation

APA

Chicago

ICMJE

MLA

NLM

Hu, J., Reiter, J. P., & Wang, Q. (2014). Disclosure risk evaluation for fully synthetic categorical data. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 8744, pp. 185–199). https://doi.org/10.1007/978-3-319-11257-2_15

Hu, J., J. P. Reiter, and Q. Wang. “Disclosure risk evaluation for fully synthetic categorical data.” In Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 8744:185–99, 2014. https://doi.org/10.1007/978-3-319-11257-2_15.

Hu J, Reiter JP, Wang Q. Disclosure risk evaluation for fully synthetic categorical data. In: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). 2014. p. 185–99.

Hu, J., et al. “Disclosure risk evaluation for fully synthetic categorical data.” Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 8744, 2014, pp. 185–99. Scopus, doi:10.1007/978-3-319-11257-2_15.

Hu J, Reiter JP, Wang Q. Disclosure risk evaluation for fully synthetic categorical data. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). 2014. p. 185–199.

Published In

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

DOI

10.1007/978-3-319-11257-2_15

EISSN

1611-3349

ISSN

0302-9743

ISBN

9783319112565

Publication Date

January 1, 2014

Volume

8744

Start / End Page

185 / 199

Related Subject Headings

Artificial Intelligence & Image Processing
46 Information and computing sciences