Estimating risks of identification disclosure in microdata
When statistical agencies release microdata to the public, malicious users (intruders) may be able to link records in the released data to records in external databases. Releasing data in ways that fail to prevent such identifications may discredit the agency or, for some data, constitute a breach of law. To limit disclosures, agencies often release altered versions of the data; however, there usually remain risks of identification. This article applies and extends the framework developed by Duncan and Lambert for computing probabilities of identification for sampled units. It describes methods tailored specifically to data altered by receding and topcoding variables, data swapping, or adding random noise (and combinations of these common data alteration techniques) that agencies can use to assess threats from intruders who possess information on relationships among variables and the methods of data alteration. Using data from the Current Population Survey, the article illustrates a step-by-step process for evaluating identification disclosure risks for competing releases under varying assumptions of intruders' knowledge. Risk measures are presented for individual units and for entire datasets. © 2005 American Statistical Association.
Duke Scholars
Altmetric Attention Stats
Dimensions Citation Stats
Published In
DOI
ISSN
Publication Date
Volume
Issue
Start / End Page
Related Subject Headings
- Statistics & Probability
- 4905 Statistics
- 3802 Econometrics
- 1603 Demography
- 1403 Econometrics
- 0104 Statistics
Citation
Published In
DOI
ISSN
Publication Date
Volume
Issue
Start / End Page
Related Subject Headings
- Statistics & Probability
- 4905 Statistics
- 3802 Econometrics
- 1603 Demography
- 1403 Econometrics
- 0104 Statistics