Bayesian Modeling for Simultaneous Regression and Record Linkage
Often data analysts use probabilistic record linkage techniques to match records across two data sets. Such matching can be the primary goal, or it can be a necessary step to analyze relationships among the variables in the data sets. We propose a Bayesian hierarchical model that allows data analysts to perform simultaneous linear regression and probabilistic record linkage. This allows analysts to leverage relationships among the variables to improve linkage quality. Further, it enables analysts to propagate uncertainty in a principled way, while also potentially offering more accurate estimates of regression parameters compared to approaches that use a two-step process, i.e., link the records first, then estimate the linear regression on the linked data. We propose and evaluate three Markov chain Monte Carlo algorithms for implementing the Bayesian model, which we compare against a two-step process.
Duke Scholars
Published In
DOI
EISSN
ISSN
Publication Date
Volume
Start / End Page
Related Subject Headings
- Artificial Intelligence & Image Processing
- 46 Information and computing sciences
Citation
Published In
DOI
EISSN
ISSN
Publication Date
Volume
Start / End Page
Related Subject Headings
- Artificial Intelligence & Image Processing
- 46 Information and computing sciences