A Two-Step Variable Selection Strategy for Multiply Imputed Survival Data Using Penalized Cox Models.
Multiple imputation (MI) is widely used for handling missing data. However, applying penalized methods after MI can be challenging because variable selection may be inconsistent across imputations. We propose a two-step variable selection method for multiply imputed datasets with survival outcomes: apply LASSO or ALASSO to each MI dataset, followed by ridge regression, and combine estimates using variable selected in any or d% (d = 50, 70, 90, 100) of the MI datasets. For comparison, we also fit stacked MI datasets with weighted penalized regression and a group LASSO approach that enforces consistent selection across imputations. Simulations with Cox models evaluated tuning by AIC, BIC, cross-validation at the minimum error, and the 1SE rule. Across scenarios, performance differed by both the penalization and the selection rule. More conservative choices such as ALASSO with BIC and a 50% inclusion frequency tended to control false positive and gave more stable calibration. The grouped approach achieved comparable selection with modestly higher estimation error. Overall, no single method consistently outperformed others across all scenarios. Our findings suggest that practitioners should weigh trade-offs between selection stability, estimation accuracy, and calibration when applying penalized methods to multiply imputed survival data.
Duke Scholars
Published In
DOI
ISSN
Publication Date
Volume
Issue
Location
Related Subject Headings
- 4003 Biomedical engineering
Citation
Published In
DOI
ISSN
Publication Date
Volume
Issue
Location
Related Subject Headings
- 4003 Biomedical engineering