Bayesian marked point process modeling for generating fully synthetic public use data with point-referenced geography
Many data stewards collect confidential data that include fine geography. When sharing these data with others, data stewards strive to disseminate data that are informative for a wide range of spatial and non-spatial analyses while simultaneously protecting the confidentiality of data subjects' identities and attributes. Typically, data stewards meet this challenge by coarsening the resolution of the released geography and, as needed, perturbing the confidential attributes. When done with high intensity, these redaction strategies can result in released data with poor analytic quality. We propose an alternative dissemination approach based on fully synthetic data. We generate data using marked point process models that can maintain both the statistical properties and the spatial dependence structure of the confidential data. We illustrate the approach using data consisting of mortality records from Durham, North Carolina.
Duke Scholars
Published In
DOI
ISSN
Publication Date
Volume
Start / End Page
Related Subject Headings
- 4905 Statistics
- 0801 Artificial Intelligence and Image Processing
- 0104 Statistics
Citation
Published In
DOI
ISSN
Publication Date
Volume
Start / End Page
Related Subject Headings
- 4905 Statistics
- 0801 Artificial Intelligence and Image Processing
- 0104 Statistics