Skip to main content

Bayesian Modeling for Simultaneous Regression and Record Linkage

Publication ,  Conference
Tang, J; Reiter, JP; Steorts, RC
Published in: Lecture Notes in Computer Science Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics
January 1, 2020

Often data analysts use probabilistic record linkage techniques to match records across two data sets. Such matching can be the primary goal, or it can be a necessary step to analyze relationships among the variables in the data sets. We propose a Bayesian hierarchical model that allows data analysts to perform simultaneous linear regression and probabilistic record linkage. This allows analysts to leverage relationships among the variables to improve linkage quality. Further, it enables analysts to propagate uncertainty in a principled way, while also potentially offering more accurate estimates of regression parameters compared to approaches that use a two-step process, i.e., link the records first, then estimate the linear regression on the linked data. We propose and evaluate three Markov chain Monte Carlo algorithms for implementing the Bayesian model, which we compare against a two-step process.

Duke Scholars

Published In

Lecture Notes in Computer Science Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics

DOI

EISSN

1611-3349

ISSN

0302-9743

Publication Date

January 1, 2020

Volume

12276 LNCS

Start / End Page

209 / 223

Related Subject Headings

  • Artificial Intelligence & Image Processing
  • 46 Information and computing sciences
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Tang, J., Reiter, J. P., & Steorts, R. C. (2020). Bayesian Modeling for Simultaneous Regression and Record Linkage. In Lecture Notes in Computer Science Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics (Vol. 12276 LNCS, pp. 209–223). https://doi.org/10.1007/978-3-030-57521-2_15
Tang, J., J. P. Reiter, and R. C. Steorts. “Bayesian Modeling for Simultaneous Regression and Record Linkage.” In Lecture Notes in Computer Science Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics, 12276 LNCS:209–23, 2020. https://doi.org/10.1007/978-3-030-57521-2_15.
Tang J, Reiter JP, Steorts RC. Bayesian Modeling for Simultaneous Regression and Record Linkage. In: Lecture Notes in Computer Science Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics. 2020. p. 209–23.
Tang, J., et al. “Bayesian Modeling for Simultaneous Regression and Record Linkage.” Lecture Notes in Computer Science Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics, vol. 12276 LNCS, 2020, pp. 209–23. Scopus, doi:10.1007/978-3-030-57521-2_15.
Tang J, Reiter JP, Steorts RC. Bayesian Modeling for Simultaneous Regression and Record Linkage. Lecture Notes in Computer Science Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics. 2020. p. 209–223.

Published In

Lecture Notes in Computer Science Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics

DOI

EISSN

1611-3349

ISSN

0302-9743

Publication Date

January 1, 2020

Volume

12276 LNCS

Start / End Page

209 / 223

Related Subject Headings

  • Artificial Intelligence & Image Processing
  • 46 Information and computing sciences