Skip to main content
Journal cover image

A guide to large data sets for population-based cancer research: Strengths, limitations, and pitfalls.

Publication ,  Journal Article
Martin, AN; Chan, NW; Cheung, DC; Fong, ZV
Published in: Cancer
November 15, 2024

With the proliferation of cancer research based on large databases, misalignment of research questions and data set capabilities is inevitable. Nationally maintained databases are appealing to cancer researchers because of the ease of access to large amounts of patient data available for analysis and risk estimation. Data sets that are commonly used in cancer research include the National Cancer Database, the SEER (Surveillance, Epidemiology, and End Results) program of the National Cancer Institute, the SEER-Medicare database, the American College of Surgeons National Surgical Quality Improvement Program, and the Healthcare Cost and Utilization Project databases, among others. Each data set has pros and cons with respect to variable availability and the ability to analyze cancer-specific outcomes. It is critical for researchers to understand the strengths and limitations of each database. Changing variable definitions, the length of postoperative data collection, and the availability of patient-reported outcomes or social determinants of health data are examples of factors that researchers must consider when selecting a data set for research purposes. For the current review, the authors summarized the advantages and disadvantages of various national data sets for cohort studies in cancer populations.

Duke Scholars

Published In

Cancer

DOI

EISSN

1097-0142

Publication Date

November 15, 2024

Volume

130

Issue

22

Start / End Page

3802 / 3814

Location

United States

Related Subject Headings

  • United States
  • SEER Program
  • Oncology & Carcinogenesis
  • Neoplasms
  • Humans
  • Databases, Factual
  • Biomedical Research
  • 4206 Public health
  • 3211 Oncology and carcinogenesis
  • 1117 Public Health and Health Services
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Martin, A. N., Chan, N. W., Cheung, D. C., & Fong, Z. V. (2024). A guide to large data sets for population-based cancer research: Strengths, limitations, and pitfalls. Cancer, 130(22), 3802–3814. https://doi.org/10.1002/cncr.35535
Martin, Allison N., Norine W. Chan, Dillon C. Cheung, and Zhi Ven Fong. “A guide to large data sets for population-based cancer research: Strengths, limitations, and pitfalls.Cancer 130, no. 22 (November 15, 2024): 3802–14. https://doi.org/10.1002/cncr.35535.
Martin AN, Chan NW, Cheung DC, Fong ZV. A guide to large data sets for population-based cancer research: Strengths, limitations, and pitfalls. Cancer. 2024 Nov 15;130(22):3802–14.
Martin, Allison N., et al. “A guide to large data sets for population-based cancer research: Strengths, limitations, and pitfalls.Cancer, vol. 130, no. 22, Nov. 2024, pp. 3802–14. Pubmed, doi:10.1002/cncr.35535.
Martin AN, Chan NW, Cheung DC, Fong ZV. A guide to large data sets for population-based cancer research: Strengths, limitations, and pitfalls. Cancer. 2024 Nov 15;130(22):3802–3814.
Journal cover image

Published In

Cancer

DOI

EISSN

1097-0142

Publication Date

November 15, 2024

Volume

130

Issue

22

Start / End Page

3802 / 3814

Location

United States

Related Subject Headings

  • United States
  • SEER Program
  • Oncology & Carcinogenesis
  • Neoplasms
  • Humans
  • Databases, Factual
  • Biomedical Research
  • 4206 Public health
  • 3211 Oncology and carcinogenesis
  • 1117 Public Health and Health Services