Skip to main content

Clinical Data Prediction Model to Identify Patients With Early-Stage Pancreatic Cancer.

Publication ,  Journal Article
Chen, Q; Cherry, DR; Nalawade, V; Qiao, EM; Kumar, A; Lowy, AM; Simpson, DR; Murphy, JD
Published in: JCO clinical cancer informatics
March 2021

Pancreatic cancer is an aggressive malignancy with patients often experiencing nonspecific symptoms before diagnosis. This study evaluates a machine learning approach to help identify patients with early-stage pancreatic cancer from clinical data within electronic health records (EHRs).From the Optum deidentified EHR data set, we identified early-stage (n = 3,322) and late-stage (n = 25,908) pancreatic cancer cases over 40 years of age diagnosed between 2009 and 2017. Patients with early-stage pancreatic cancer were matched to noncancer controls (1:16 match). We constructed a prediction model using eXtreme Gradient Boosting (XGBoost) to identify early-stage patients on the basis of 18,220 features within the EHR including diagnoses, procedures, information within clinical notes, and medications. Model accuracy was assessed with sensitivity, specificity, positive predictive value, and the area under the curve.The final predictive model included 582 predictive features from the EHR, including 248 (42.5%) physician note elements, 146 (25.0%) procedure codes, 91 (15.6%) diagnosis codes, 89 (15.3%) medications, and 9 (1.5%) demographic features. The final model area under the curve was 0.84. Choosing a model cut point with a sensitivity of 60% and specificity of 90% would enable early detection of 58% late-stage patients with a median of 24 months before their actual diagnosis.Prediction models using EHR data show promise in the early detection of pancreatic cancer. Although widespread use of this approach on an unselected population would produce high rates of false-positive tests, this technique may be rapidly impactful if deployed among high-risk patients or paired with other imaging or biomarker screening tools.

Duke Scholars

Altmetric Attention Stats
Dimensions Citation Stats

Published In

JCO clinical cancer informatics

DOI

EISSN

2473-4276

ISSN

2473-4276

Publication Date

March 2021

Volume

5

Start / End Page

279 / 287

Related Subject Headings

  • Predictive Value of Tests
  • Pancreatic Neoplasms
  • Machine Learning
  • Humans
  • Electronic Health Records
  • Early Detection of Cancer
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Chen, Q., Cherry, D. R., Nalawade, V., Qiao, E. M., Kumar, A., Lowy, A. M., … Murphy, J. D. (2021). Clinical Data Prediction Model to Identify Patients With Early-Stage Pancreatic Cancer. JCO Clinical Cancer Informatics, 5, 279–287. https://doi.org/10.1200/cci.20.00137
Chen, Qinyu, Daniel R. Cherry, Vinit Nalawade, Edmund M. Qiao, Abhishek Kumar, Andrew M. Lowy, Daniel R. Simpson, and James D. Murphy. “Clinical Data Prediction Model to Identify Patients With Early-Stage Pancreatic Cancer.JCO Clinical Cancer Informatics 5 (March 2021): 279–87. https://doi.org/10.1200/cci.20.00137.
Chen Q, Cherry DR, Nalawade V, Qiao EM, Kumar A, Lowy AM, et al. Clinical Data Prediction Model to Identify Patients With Early-Stage Pancreatic Cancer. JCO clinical cancer informatics. 2021 Mar;5:279–87.
Chen, Qinyu, et al. “Clinical Data Prediction Model to Identify Patients With Early-Stage Pancreatic Cancer.JCO Clinical Cancer Informatics, vol. 5, Mar. 2021, pp. 279–87. Epmc, doi:10.1200/cci.20.00137.
Chen Q, Cherry DR, Nalawade V, Qiao EM, Kumar A, Lowy AM, Simpson DR, Murphy JD. Clinical Data Prediction Model to Identify Patients With Early-Stage Pancreatic Cancer. JCO clinical cancer informatics. 2021 Mar;5:279–287.

Published In

JCO clinical cancer informatics

DOI

EISSN

2473-4276

ISSN

2473-4276

Publication Date

March 2021

Volume

5

Start / End Page

279 / 287

Related Subject Headings

  • Predictive Value of Tests
  • Pancreatic Neoplasms
  • Machine Learning
  • Humans
  • Electronic Health Records
  • Early Detection of Cancer