Race-specific survival prediction models for de novo metastatic breast cancer using machine learning.
e13116Background: Breast cancer is the most common malignancy among women and a major cause of cancer death. About 5-10% of breast cancer patients are initially diagnosed with stage IV (de novo metastatic breast cancer). A significant survival difference exists between patients of different races, and existing prognostic models often ignore the unique characteristics of specific racial groups. Methods: Data were obtained from the National Cancer Database (2010-2018) and included 48, 354 patients with stage IV breast cancer: non-Hispanic whites (White, 36, 505), non-Hispanic blacks (Black, 8852), and Hispanics (2997). Each cohort was split into 70% training and 30% validation sets. Twenty variables were selected by univariate/multivariate analysis combined with clinical significance. Six machine learning methods, Lasso-Cox, RSF, XGBoost, GBM, Superpc, and plsRcox, were used to build survival prediction models. Models estimated survival probabilities and stratified patients into high- and low-risk groups. Results: Hispanic had the best prognosis (3y OS: 0.56, 95% CI: 0.54–0.58; 5y OS: 0.40, 95% CI: 0.38–0.42), while Black had the poorest outcomes (3y OS: 0.38, 95% CI: 0.37–0.39; 5y OS: 0.24, 95% CI: 0.23–0.25). In all 3 racial populations, the RSF model had the best predictive efficacy. For White patients, the RSF model achieved 1y survival prediction AUC of 0.84 (95% CI: 0.83-0.84, training) and 0.80 (95% CI: 0.79-0.81, validation); 3y AUC of 0.80 (95% CI: 0.795-0.806, training) and 0.74 (95% CI: 0.73-0.75, validation), and 5y AUC of 0.78 (95% CI: 0.77-0.79, training) and 0.72 (95% CI: 0.71-0.73, validation). For Black patients, the RSF model achieved 1y AUC of 0.85 (95% CI: 0.84-0.86, training) and 0.79 (95% CI: 0.77-0.81, validation); 3y AUC of 0.82 (95% CI: 0.81-0.83, training) and 0.72 (95% CI: 0.71-0.74, validation), and 5y AUC of 0.80 (95% CI: 0.79-0.81, training) and 0.69 (95% CI: 0.67-0.71, validation). For Hispanic patients, the RSF model achieved 1y AUC of 0.89 (95% CI: 0.88-0.91, training) and 0.79 (95% CI: 0.75-0.83, validation); 3y AUC of 0.87 (95% CI: 0.85-0.88, training) and 0.72 (95% CI: 0.69-0.76, validation), and 5y AUC of 0.84 (95% CI: 0.82-0.86, training) and 0.67 (95% CI: 0.64-0.71, validation). Furthermore, risk stratification based on the RSF prediction model showed significant survival differences between the high- and low-risk groups in all cohorts (p < 0.001). Conclusions: This study utilized six machine learning methods to develop race-specific time-dependent survival prediction models for de novo metastatic breast cancer, emphasizing the importance of focusing on racial differences in breast cancer patients. The model is expected to assess survival prognosis for patients of different races and guide intensive treatment for high-risk groups. Future studies will focus on external validation to improve the generalizability and clinical applicability of these models.
Duke Scholars
Published In
DOI
EISSN
ISSN
Publication Date
Volume
Issue
Start / End Page
Related Subject Headings
- Oncology & Carcinogenesis
- 3211 Oncology and carcinogenesis
- 1112 Oncology and Carcinogenesis
- 1103 Clinical Sciences
Citation
Published In
DOI
EISSN
ISSN
Publication Date
Volume
Issue
Start / End Page
Related Subject Headings
- Oncology & Carcinogenesis
- 3211 Oncology and carcinogenesis
- 1112 Oncology and Carcinogenesis
- 1103 Clinical Sciences