Machine Learning and Prediction of All-Cause Mortality among Chinese Older Adults
Mortality risk stratification was vital for targeted intervention. This study aimed at building the prediction model of all-cause mortality among Chinese dwelling elderly with different methods including regression models and machine learning models and to compare the performance of machine learning models with regression model on predicting mortality. Additionally, this study also aimed at ranking the predictors of mortality within different models and comparing the predictive value of different groups of predictors using the model with best performance. We used data from the 2008 and 2011 waves of Chinese Longitudinal Healthy Longevity Survey (CLHLS). The follow-up of CLHLS was conducted every 2-3 years till 2018. The analysis sample included 2,448 participants. We used totally 117 predictors to build the prediction model for survival including 61 questionnaire, 41 biomarker and 15 genetics predictors. Four models were built (XG-Boost, random survival forest [RSF], Cox regression with all variables and Cox-backward). We used C-index and integrated Brier score to evaluate the performance of those models. The XG-Boost model and RSF model shows slightly better predictive performance than Cox models and Cox-backward models based on the C-index and integrated Brier score in predicting surviving. Age, activity of daily living and Mini-Mental State Examination score were identified as the top 3 predictors in the XG-Boost and RSF models. Biomarker and questionnaire predictors have a similar predictive value, while genetic predictors have no addictive predictive value when combined with questionnaire or biomarker predictors. In this work, it is shown that machine learning techniques can be a useful tool for both prediction and its performance sightly outperformed the regression model in predicting survival.