Skip to main content

Generalized bayesian record linkage and regression with exact error propagation

Publication ,  Conference
Steorts, RC; Tancredi, A; Liseo, B
Published in: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
January 1, 2018

Record linkage (de-duplication or entity resolution) is the process of merging noisy databases to remove duplicate entities. While record linkage removes duplicate entities from such databases, the downstream task is any inferential, predictive, or post-linkage task on the linked data. One goal of the downstream task is obtaining a larger reference data set, allowing one to perform more accurate statistical analyses. In addition, there is inherent record linkage uncertainty passed to the downstream task. Motivated by the above, we propose a generalized Bayesian record linkage method and consider multiple regression analysis as the downstream task. Records are linked via a random partition model, which allows for a wide class to be considered. In addition, we jointly model the record linkage and downstream task, which allows one to account for the record linkage uncertainty exactly. Moreover, one is able to generate a feedback propagation mechanism of the information from the proposed Bayesian record linkage model into the downstream task. This feedback effect is essential to eliminate potential biases that can jeopardize resulting downstream task. We apply our methodology to multiple linear regression, and illustrate empirically that the “feedback effect” is able to improve the performance of record linkage.

Duke Scholars

Published In

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

DOI

EISSN

1611-3349

ISSN

0302-9743

Publication Date

January 1, 2018

Volume

11126 LNCS

Start / End Page

297 / 313

Related Subject Headings

  • Artificial Intelligence & Image Processing
  • 46 Information and computing sciences
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Steorts, R. C., Tancredi, A., & Liseo, B. (2018). Generalized bayesian record linkage and regression with exact error propagation. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11126 LNCS, pp. 297–313). https://doi.org/10.1007/978-3-319-99771-1_20
Steorts, R. C., A. Tancredi, and B. Liseo. “Generalized bayesian record linkage and regression with exact error propagation.” In Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 11126 LNCS:297–313, 2018. https://doi.org/10.1007/978-3-319-99771-1_20.
Steorts RC, Tancredi A, Liseo B. Generalized bayesian record linkage and regression with exact error propagation. In: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). 2018. p. 297–313.
Steorts, R. C., et al. “Generalized bayesian record linkage and regression with exact error propagation.” Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 11126 LNCS, 2018, pp. 297–313. Scopus, doi:10.1007/978-3-319-99771-1_20.
Steorts RC, Tancredi A, Liseo B. Generalized bayesian record linkage and regression with exact error propagation. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). 2018. p. 297–313.

Published In

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

DOI

EISSN

1611-3349

ISSN

0302-9743

Publication Date

January 1, 2018

Volume

11126 LNCS

Start / End Page

297 / 313

Related Subject Headings

  • Artificial Intelligence & Image Processing
  • 46 Information and computing sciences