Estimating risks of identification disclosure in microdata
When statistical agencies release microdata to the public, malicious users (intruders) may be able to link records in the released data to records in external databases. Releasing data in ways that fail to prevent such identifications may discredit the agency or, for some data, constitute a breach of law. To limit disclosures, agencies often release altered versions of the data; however, there usually remain risks of identification. This article applies and extends the framework developed by Duncan and Lambert for computing probabilities of identification for sampled units. It describes methods tailored specifically to data altered by receding and topcoding variables, data swapping, or adding random noise (and combinations of these common data alteration techniques) that agencies can use to assess threats from intruders who possess information on relationships among variables and the methods of data alteration. Using data from the Current Population Survey, the article illustrates a step-by-step process for evaluating identification disclosure risks for competing releases under varying assumptions of intruders' knowledge. Risk measures are presented for individual units and for entire datasets. © 2005 American Statistical Association.
Volume / Issue
Start / End Page
International Standard Serial Number (ISSN)
Digital Object Identifier (DOI)