Scholars@Duke publication: Regression Modeling and File Matching Using Possibly Erroneous Matching Variables

Regression Modeling and File Matching Using Possibly Erroneous Matching Variables

Publication , Journal Article

Dalzell, NM; Reiter, JP

Published in: Journal of Computational and Graphical Statistics

October 2, 2018

Many analyses require linking records from two databases comprising overlapping sets of individuals. In the absence of unique identifiers, the linkage procedure often involves matching on a set of categorical variables, such as demographics, common to both files. Typically, however, the resulting matches are inexact: some cross-classifications of the matching variables do not generate unique links across files. Further, the variables used for matching can be subject to reporting errors, which introduce additional uncertainty in analyses. We present a Bayesian file matching methodology designed to estimate regression models and match records simultaneously when categorical variables used for matching are subject to errors. The method relies on a hierarchical model that includes (1) the regression of interest involving variables from the two files given a vector indicating the links, (2) a model for the linking vector given the true values of the variables used for matching, (3) a model for reported values of the variables used for matching given their true values, and (4) a model for the true values of the variables used for matching. We describe algorithms for sampling from the posterior distribution of the model. We illustrate the methodology using artificial data and data from education records in the state of North Carolina.

Duke Scholars

Author Jerome P. Reiter Statistical Science

Published In

Journal of Computational and Graphical Statistics

DOI

10.1080/10618600.2018.1458624

EISSN

1537-2715

ISSN

1061-8600

Publication Date

October 2, 2018

Volume

Issue

Start / End Page

728 / 738

Related Subject Headings

Statistics & Probability
4905 Statistics
1403 Econometrics
0104 Statistics

Citation

APA

Chicago

ICMJE

MLA

NLM

Dalzell, N. M., & Reiter, J. P. (2018). Regression Modeling and File Matching Using Possibly Erroneous Matching Variables. Journal of Computational and Graphical Statistics, 27(4), 728–738. https://doi.org/10.1080/10618600.2018.1458624

Dalzell, N. M., and J. P. Reiter. “Regression Modeling and File Matching Using Possibly Erroneous Matching Variables.” Journal of Computational and Graphical Statistics 27, no. 4 (October 2, 2018): 728–38. https://doi.org/10.1080/10618600.2018.1458624.

Dalzell NM, Reiter JP. Regression Modeling and File Matching Using Possibly Erroneous Matching Variables. Journal of Computational and Graphical Statistics. 2018 Oct 2;27(4):728–38.

Dalzell, N. M., and J. P. Reiter. “Regression Modeling and File Matching Using Possibly Erroneous Matching Variables.” Journal of Computational and Graphical Statistics, vol. 27, no. 4, Oct. 2018, pp. 728–38. Scopus, doi:10.1080/10618600.2018.1458624.

Dalzell NM, Reiter JP. Regression Modeling and File Matching Using Possibly Erroneous Matching Variables. Journal of Computational and Graphical Statistics. 2018 Oct 2;27(4):728–738.

Published In

Journal of Computational and Graphical Statistics

DOI

10.1080/10618600.2018.1458624

EISSN

1537-2715

ISSN

1061-8600

Publication Date

October 2, 2018

Volume

Issue

Start / End Page

728 / 738

Related Subject Headings

Statistics & Probability
4905 Statistics
1403 Econometrics
0104 Statistics