Skip to main content
Journal cover image

Analysis of main risk factors causing stroke in Shanxi Province based on machine learning models

Publication ,  Journal Article
Liu, J; Sun, Y; Ma, J; Tu, J; Deng, Y; He, P; Li, R; Hu, F; Huang, H; Zhou, X; Xu, S
Published in: Informatics in Medicine Unlocked
January 1, 2021

Background: In China, stroke has been the first leading cause of death in recent years. It is a major cause of long-term physical and cognitive impairment, which bring great pressure on the National Public Health System. On the other hand, China is a big country, evaluation of the risk of getting stroke is important for the prevention and treatment of stroke in China. Methods: A data set with 2000 hospitalized stroke patients in 2018 and 27583 residents during the year 2017 to 2020 is analyzed in this study. With the cleaned data, three models on stroke risk levels are built by using machine learning methods. The importance of “8+2” factors from China National Stroke Prevention Project (CSPP) is evaluated via decision tree and random forest models. The importance of more detailed features and their SHAP values are evaluated and ranked via random forest model. Furthermore, a logistic regression model is applied to evaluate the probability of getting stroke for different risk levels. Results: Among all “8+2” risk factors of getting stroke, the decision tree model reveals that top three factors are Hypertension (0.4995), Physical Inactivity (0.08486) and Diabetes Mellitus (0.07889), and the random forest model shows that top three factors are Hypertension (0.3966), Hyperlipidemia (0.1229) and Physical Inactivity (0.1146). In addition to “8+2” factors the importance of features for lifestyle information, demographic information and medical measurement are evaluated via random forest model. It shows that top five features are Systolic Blood Pressure (SBP) (0.3670), Diastolic Blood Pressure (DBP) (0.1541), Physical Inactivity (0.0904), Body Mass Index (BMI) (0.0721) and Fasting Blood Glucose (FBG)(0.0531). SHAP values show that DBP, Physical Inactivity, SBP, BMI, Smoking, FBG, and Triglyceride(TG) are positively correlated to the risk of getting stroke. High-density Lipoprotein (HDL) is negatively correlated to the risk of getting stroke. Combining with the data of 2000 hospitalized stroke patients, the logistic regression model shows that the average probabilities of getting stroke are 7.20%±0.55% for the low-risk level patients, 19.02%±0.94% for the medium-risk level patients and 83.89%±0.97% for the high-risk level patients. Conclusion: Based on the census data from Shanxi Province, we investigate stroke risk factors and their ranking. It shows that Hypertension, Physical Inactivity, and Overweight are ranked as the top three high stroke risk factors in Shanxi. The probability of getting a stroke is also estimated through our interpretable machine learning methods.

Duke Scholars

Published In

Informatics in Medicine Unlocked

DOI

ISSN

2352-9148

Publication Date

January 1, 2021

Volume

26

Related Subject Headings

  • 4203 Health services and systems
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Liu, J., Sun, Y., Ma, J., Tu, J., Deng, Y., He, P., … Xu, S. (2021). Analysis of main risk factors causing stroke in Shanxi Province based on machine learning models. Informatics in Medicine Unlocked, 26. https://doi.org/10.1016/j.imu.2021.100712
Liu, J., Y. Sun, J. Ma, J. Tu, Y. Deng, P. He, R. Li, et al. “Analysis of main risk factors causing stroke in Shanxi Province based on machine learning models.” Informatics in Medicine Unlocked 26 (January 1, 2021). https://doi.org/10.1016/j.imu.2021.100712.
Liu J, Sun Y, Ma J, Tu J, Deng Y, He P, et al. Analysis of main risk factors causing stroke in Shanxi Province based on machine learning models. Informatics in Medicine Unlocked. 2021 Jan 1;26.
Liu, J., et al. “Analysis of main risk factors causing stroke in Shanxi Province based on machine learning models.” Informatics in Medicine Unlocked, vol. 26, Jan. 2021. Scopus, doi:10.1016/j.imu.2021.100712.
Liu J, Sun Y, Ma J, Tu J, Deng Y, He P, Li R, Hu F, Huang H, Zhou X, Xu S. Analysis of main risk factors causing stroke in Shanxi Province based on machine learning models. Informatics in Medicine Unlocked. 2021 Jan 1;26.
Journal cover image

Published In

Informatics in Medicine Unlocked

DOI

ISSN

2352-9148

Publication Date

January 1, 2021

Volume

26

Related Subject Headings

  • 4203 Health services and systems