A two-stage approach for combining gene expression and mutation with clinical data improves survival prediction in myelodysplastic syndromes and ovarian cancer.
MOTIVATION: Many traditional clinical prognostic factors have been known for cancer for years, but usually provide poor survival prediction. Genomic information is more easily available now which offers opportunities to build more accurate prognostic models. The challenge is how to integrate them to improve survival prediction. The common approach of jointly analyzing all type of covariates directly in one single model may not improve the prediction due to increased model complexity and cannot be easily applied to different datasets. RESULTS: We proposed a two-stage procedure to better combine different sources of information for survival prediction, and applied the two-stage procedure in two cancer datasets: myelodysplastic syndromes (MDS) and ovarian cancer. Our analysis suggests that the prediction performance of different data types are very different, and combining clinical, gene expression and mutation data using the two-stage procedure improves survival prediction in terms of improved concordance index and reduced prediction error. AVAILABILITY AND IMPLEMENTATION: The two-stage procedure can be implemented in BhGLM package which is freely available at http://www.ssg.uab.edu/bhglm/. CONTACT: nyi@uab.edu.