Scholars@Duke publication: Protecting Confidentiality in Cancer Registry Data With Geographic Identifiers.

Protecting Confidentiality in Cancer Registry Data With Geographic Identifiers.

Publication , Journal Article

Yu, M; Reiter, JP; Zhu, L; Liu, B; Cronin, KA; Feuer, EJR

Published in: American journal of epidemiology

July 2017

The National Cancer Institute's Surveillance, Epidemiology, and End Results Program releases research files of cancer registry data. These files include geographic information at the county level, but no finer. Access to finer geography, such as census tract identifiers, would enable richer analyses-for example, examination of health disparities across neighborhoods. To date, tract identifiers have been left off the research files because they could compromise the confidentiality of patients' identities. We present an approach to inclusion of tract identifiers based on multiply imputed, synthetic data. The idea is to build a predictive model of tract locations, given patient and tumor characteristics, and randomly simulate the tract of each patient by sampling from this model. For the predictive model, we use multivariate regression trees fitted to the latitude and longitude of the population centroid of each tract. We implement the approach in the registry data from California. The method results in synthetic data that reproduce a wide range (but not all) of analyses of census tract socioeconomic cancer disparities and have relatively low disclosure risks, which we assess by comparing individual patients' actual and synthetic tract locations. We conclude with a discussion of how synthetic data sets can be used by researchers with cancer registry data.

Duke Scholars

Author Jerome P. Reiter Statistical Science

Published In

American journal of epidemiology

DOI

10.1093/aje/kwx050

EISSN

1476-6256

ISSN

0002-9262

Publication Date

July 2017

Volume

186

Issue

Start / End Page

83 / 91

Related Subject Headings

Young Adult
Socioeconomic Factors
Small-Area Analysis
Sex Distribution
SEER Program
Registries
Racial Groups
Neoplasms
Middle Aged
Male

Citation

APA

Chicago

ICMJE

MLA

NLM

Yu, M., Reiter, J. P., Zhu, L., Liu, B., Cronin, K. A., & Feuer, E. J. R. (2017). Protecting Confidentiality in Cancer Registry Data With Geographic Identifiers. American Journal of Epidemiology, 186(1), 83–91. https://doi.org/10.1093/aje/kwx050

Yu, Mandi, Jerome Phillip Reiter, Li Zhu, Benmei Liu, Kathleen A. Cronin, and Eric J Rocky Feuer. “Protecting Confidentiality in Cancer Registry Data With Geographic Identifiers.” American Journal of Epidemiology 186, no. 1 (July 2017): 83–91. https://doi.org/10.1093/aje/kwx050.

Yu M, Reiter JP, Zhu L, Liu B, Cronin KA, Feuer EJR. Protecting Confidentiality in Cancer Registry Data With Geographic Identifiers. American journal of epidemiology. 2017 Jul;186(1):83–91.

Yu, Mandi, et al. “Protecting Confidentiality in Cancer Registry Data With Geographic Identifiers.” American Journal of Epidemiology, vol. 186, no. 1, July 2017, pp. 83–91. Epmc, doi:10.1093/aje/kwx050.

Yu M, Reiter JP, Zhu L, Liu B, Cronin KA, Feuer EJR. Protecting Confidentiality in Cancer Registry Data With Geographic Identifiers. American journal of epidemiology. 2017 Jul;186(1):83–91.

Published In

American journal of epidemiology

DOI

10.1093/aje/kwx050

EISSN

1476-6256

ISSN

0002-9262

Publication Date

July 2017

Volume

186

Issue

Start / End Page

83 / 91

Related Subject Headings

Young Adult
Socioeconomic Factors
Small-Area Analysis
Sex Distribution
SEER Program
Registries
Racial Groups
Neoplasms
Middle Aged
Male