Skip to main content

SMERED: A Bayesian approach to graphical record linkage and de-duplication

Publication ,  Journal Article
Steorts, RC; Hall, R; Fienberg, SE
Published in: Journal of Machine Learning Research
January 1, 2014

We propose a novel unsupervised approach for linking records across arbitrarily many files, while simultaneously detecting duplicate records within files. Our key innovation is to represent the pattern of links between records as a bipartite graph, in which records are directly linked to latent true individuals, and only indirectly linked to other records. This flexible new representation of the linkage structure naturally allows us to estimate the attributes of the unique observable people in the population, calculate k-way posterior probabilities of matches across records, and propagate the uncertainty of record linkage into later analyses. Our linkage structure lends itself to an efficient, linear-time, hybrid Markov chain Monte Carlo algorithm, which overcomes many obstacles encountered by previously proposed methods of record linkage, despite the high dimensional parameter space. We assess our results on real and simulated data.

Duke Scholars

Published In

Journal of Machine Learning Research

EISSN

1533-7928

ISSN

1532-4435

Publication Date

January 1, 2014

Volume

33

Start / End Page

922 / 930

Related Subject Headings

  • Artificial Intelligence & Image Processing
  • 4905 Statistics
  • 4611 Machine learning
  • 17 Psychology and Cognitive Sciences
  • 08 Information and Computing Sciences
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Steorts, R. C., Hall, R., & Fienberg, S. E. (2014). SMERED: A Bayesian approach to graphical record linkage and de-duplication. Journal of Machine Learning Research, 33, 922–930.
Steorts, R. C., R. Hall, and S. E. Fienberg. “SMERED: A Bayesian approach to graphical record linkage and de-duplication.” Journal of Machine Learning Research 33 (January 1, 2014): 922–30.
Steorts RC, Hall R, Fienberg SE. SMERED: A Bayesian approach to graphical record linkage and de-duplication. Journal of Machine Learning Research. 2014 Jan 1;33:922–30.
Steorts, R. C., et al. “SMERED: A Bayesian approach to graphical record linkage and de-duplication.” Journal of Machine Learning Research, vol. 33, Jan. 2014, pp. 922–30.
Steorts RC, Hall R, Fienberg SE. SMERED: A Bayesian approach to graphical record linkage and de-duplication. Journal of Machine Learning Research. 2014 Jan 1;33:922–930.

Published In

Journal of Machine Learning Research

EISSN

1533-7928

ISSN

1532-4435

Publication Date

January 1, 2014

Volume

33

Start / End Page

922 / 930

Related Subject Headings

  • Artificial Intelligence & Image Processing
  • 4905 Statistics
  • 4611 Machine learning
  • 17 Psychology and Cognitive Sciences
  • 08 Information and Computing Sciences