Disclosure risk and data utility for partially synthetic data: An empirical study using the german IAB establishment survey
Statistical agencies that disseminate data to the public must protect the confidentiality of respondents' identities and sensitive attributes. To satisfy these requirements, agencies can release the units originally surveyed with some values, such as sensitive values at high risk of disclosure or values of key identifiers, replaced with multiple imputations. These are called partially synthetic data. In this article, we empirically examine trade-offs between inferential accuracy and confidentiality risks for partially synthetic data, with emphasis on the role of the number of released datasets. We also present a two-stage imputation scheme that allows agencies to release different numbers of imputations for different variables. This scheme can result in lower disclosure risks and higher data utility than the typical one-stage imputation with the same number of released datasets. The empirical analyses are based on partial synthesis of the German IAB Establishment Survey. Copyright © 1996-2010, Statistics Sweden 1996-2010.
Duke Scholars
Published In
ISSN
Publication Date
Volume
Issue
Start / End Page
Related Subject Headings
- 4905 Statistics
- 1603 Demography
- 0104 Statistics
Citation
Published In
ISSN
Publication Date
Volume
Issue
Start / End Page
Related Subject Headings
- 4905 Statistics
- 1603 Demography
- 0104 Statistics