Skip to main content
Journal cover image

Theoretical limits of microclustering for record linkage.

Publication ,  Journal Article
Johndrow, JE; Lum, K; Dunson, DB
Published in: Biometrika
June 2018

There has been substantial recent interest in record linkage, where one attempts to group the records pertaining to the same entities from one or more large databases that lack unique identifiers. This can be viewed as a type of microclustering, with few observations per cluster and a very large number of clusters. We show that the problem is fundamentally hard from a theoretical perspective and, even in idealized cases, accurate entity resolution is effectively impossible unless the number of entities is small relative to the number of records and/or the separation between records from different entities is extremely large. These results suggest conservatism in interpretation of the results of record linkage, support collection of additional data to more accurately disambiguate the entities, and motivate a focus on coarser inference. For example, results from a simulation study suggest that sometimes one may obtain accurate results for population size estimation even when fine-scale entity resolution is inaccurate.

Duke Scholars

Altmetric Attention Stats
Dimensions Citation Stats

Published In

Biometrika

DOI

EISSN

1464-3510

ISSN

0006-3444

Publication Date

June 2018

Volume

105

Issue

2

Start / End Page

431 / 446

Related Subject Headings

  • Statistics & Probability
  • 4905 Statistics
  • 3802 Econometrics
  • 1403 Econometrics
  • 0104 Statistics
  • 0103 Numerical and Computational Mathematics
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Johndrow, J. E., Lum, K., & Dunson, D. B. (2018). Theoretical limits of microclustering for record linkage. Biometrika, 105(2), 431–446. https://doi.org/10.1093/biomet/asy003
Johndrow, J. E., K. Lum, and D. B. Dunson. “Theoretical limits of microclustering for record linkage.Biometrika 105, no. 2 (June 2018): 431–46. https://doi.org/10.1093/biomet/asy003.
Johndrow JE, Lum K, Dunson DB. Theoretical limits of microclustering for record linkage. Biometrika. 2018 Jun;105(2):431–46.
Johndrow, J. E., et al. “Theoretical limits of microclustering for record linkage.Biometrika, vol. 105, no. 2, June 2018, pp. 431–46. Epmc, doi:10.1093/biomet/asy003.
Johndrow JE, Lum K, Dunson DB. Theoretical limits of microclustering for record linkage. Biometrika. 2018 Jun;105(2):431–446.
Journal cover image

Published In

Biometrika

DOI

EISSN

1464-3510

ISSN

0006-3444

Publication Date

June 2018

Volume

105

Issue

2

Start / End Page

431 / 446

Related Subject Headings

  • Statistics & Probability
  • 4905 Statistics
  • 3802 Econometrics
  • 1403 Econometrics
  • 0104 Statistics
  • 0103 Numerical and Computational Mathematics