Estimating risks of identification disclosure in microdata

Journal Article (Review;Journal)

When statistical agencies release microdata to the public, malicious users (intruders) may be able to link records in the released data to records in external databases. Releasing data in ways that fail to prevent such identifications may discredit the agency or, for some data, constitute a breach of law. To limit disclosures, agencies often release altered versions of the data; however, there usually remain risks of identification. This article applies and extends the framework developed by Duncan and Lambert for computing probabilities of identification for sampled units. It describes methods tailored specifically to data altered by receding and topcoding variables, data swapping, or adding random noise (and combinations of these common data alteration techniques) that agencies can use to assess threats from intruders who possess information on relationships among variables and the methods of data alteration. Using data from the Current Population Survey, the article illustrates a step-by-step process for evaluating identification disclosure risks for competing releases under varying assumptions of intruders' knowledge. Risk measures are presented for individual units and for entire datasets. © 2005 American Statistical Association.

Full Text

Duke Authors

Cited Authors

  • Reiter, JP

Published Date

  • December 1, 2005

Published In

Volume / Issue

  • 100 / 472

Start / End Page

  • 1103 - 1112

International Standard Serial Number (ISSN)

  • 0162-1459

Digital Object Identifier (DOI)

  • 10.1198/016214505000000619

Citation Source

  • Scopus