Scholars@Duke publication: Assessing disclosure risks for synthetic data with arbitrary intruder knowledge

Assessing disclosure risks for synthetic data with arbitrary intruder knowledge

Publication , Journal Article

McClure, D; Reiter, JP

Published in: Statistical Journal of the Iaos

February 27, 2016

Several statistical agencies release synthetic microdata, i.e., data with all confidential values replaced with draws from statistical models, in order to protect data subjects' confidentiality. While fully synthetic data are safe from record linkage attacks, intruders might be able to use the released synthetic values to estimate confidential values for individuals in the collected data. We demonstrate and investigate this potential risk using two simple but informative scenarios: a single continuous variable possibly with outliers, and a three-way contingency table possibly with small counts in some cells. Beginning with the case that the intruder knows all but one value in the confidential data, we examine the effect on risk of decreasing the number of observations the intruder knows beforehand. We generally find that releasing synthetic data (1) can pose little risk to records in the middle of the distribution, and (2) can pose some risks to extreme outliers, although arguably these risks are mild. We also find that the effect of removing observations from an intruder's background knowledge heavily depends on how well that intruder can fill in those missing observations: the risk remains fairly constant if he/she can fill them in well, and drops quickly if he/she cannot.

Duke Scholars

Author Jerome P. Reiter Statistical Science

Published In

Statistical Journal of the Iaos

DOI

10.3233/SJI-160957

ISSN

1874-7655

Publication Date

February 27, 2016

Volume

Issue

Start / End Page

109 / 126

Related Subject Headings

Economics
4905 Statistics
0104 Statistics

Citation

APA

Chicago

ICMJE

MLA

NLM

McClure, D., & Reiter, J. P. (2016). Assessing disclosure risks for synthetic data with arbitrary intruder knowledge. Statistical Journal of the Iaos, 32(1), 109–126. https://doi.org/10.3233/SJI-160957

McClure, D., and J. P. Reiter. “Assessing disclosure risks for synthetic data with arbitrary intruder knowledge.” Statistical Journal of the Iaos 32, no. 1 (February 27, 2016): 109–26. https://doi.org/10.3233/SJI-160957.

McClure D, Reiter JP. Assessing disclosure risks for synthetic data with arbitrary intruder knowledge. Statistical Journal of the Iaos. 2016 Feb 27;32(1):109–26.

McClure, D., and J. P. Reiter. “Assessing disclosure risks for synthetic data with arbitrary intruder knowledge.” Statistical Journal of the Iaos, vol. 32, no. 1, Feb. 2016, pp. 109–26. Scopus, doi:10.3233/SJI-160957.

McClure D, Reiter JP. Assessing disclosure risks for synthetic data with arbitrary intruder knowledge. Statistical Journal of the Iaos. 2016 Feb 27;32(1):109–126.

Published In

Statistical Journal of the Iaos

DOI

10.3233/SJI-160957

ISSN

1874-7655

Publication Date

February 27, 2016

Volume

Issue

Start / End Page

109 / 126

Related Subject Headings

Economics
4905 Statistics
0104 Statistics