Stroke Risk Factor Prediction using Gradient Boost Method

Daniel William (1) , Genrawan Hoendarto (2) , Jimmy Tjen (3)
(1) Widya Dharma Pontianak University, Pontianak, West Kalimantan, Indonesia , Indonesia
(2) Widya Dharma Pontianak University, Pontianak, West Kalimantan, Indonesia , Indonesia
(3) Widya Dharma Pontianak University, Pontianak, West Kalimantan, Indonesia , Indonesia

Abstract

Stroke is a major global heart concern, often leading to significant disability or death. Early and accurate prediction of stroke risk can significantly improve patient outcomes. To address this issue, our study employs the Gradient Boosting method to enhance stroke prediction using dataset of 750 records. Key factors analyzed include gender, age, hypertension, heart disease, marital status, work type, residence type, average glucose levels, body mass index, and smoking status; the results identified age as the primary risk factor for stroke, followed by hypertension and smoking history. After preprocessing the data, our model achieves an average accuracy of 77,2% across ten runs, demonstrating strong predictive performance. A decision tree visualization highlights the most critical risk factors associated with stroke. This model aims to assist healthcare professionals in identifying high-risk individuals for early intervention. Additionally, we compare the Gradient Boosting model with other algorithms to determine the most effective predictive approach.

References

[1] S. K. Feske, “Ischemic Stroke,” Am J Med, vol. 134, no. 12, pp. 1457–1464, Dec. 2021,
doi: 10.1016/J.AMJMED.2021.07.027.
[2] World Health Organization, “World Stroke Day 2022.” Accessed: Aug. 05, 2024. [Online]. Available:
https://www.who.int/srilanka/news/detail/29-10-2022-world-stroke-day-2022
[3] GBD 2019 Stroke Collaborators, “Global, regional, and national burden of stroke and its risk factors, 1990-2019: a systematic analysis for the Global Burden of Disease Study 2019.,” Lancet Neurol, vol. 20, no. 10, pp. 795–820, Oct. 2021, doi: 10.1016/S1474-4422(21)00252-0.
[4] M. J. O’Donnell et al., “Global and regional effects of potentially modifiable risk factors associated with acute stroke in 32 countries (INTERSTROKE): a case-control study.,” Lancet, vol. 388, no. 10046, pp. 761–75, Aug. 2016, doi: 10.1016/S0140-6736(16)30506-2.
[5] R. M. Carey, A. E. Moran, and P. K. Whelton, “Treatment of Hypertension: A Review,” JAMA, vol. 328, no. 18, pp. 1849–1861, Nov. 2022,
doi: 10.1001/jama.2022.19590.
[6] O. Mosenzon, A. Y. Y. Cheng, A. A. Rabinstein, and S. Sacco, “Diabetes and Stroke: What Are the Connections?,” jos, vol. 25, no. 1, pp. 26–38, Jan. 2023, doi: 10.5853/jos.2022.02306.
[7] C. A. Simmons, N. Poupore, and T. I. Nathaniel, “Age Stratification and Stroke Severity in the Telestroke Network,” J Clin Med, vol. 12, no. 4, Feb. 2023, doi: 10.3390/jcm12041519.
[8] D. S. Dhindsa, J. Khambhati, W. M. Schultz, A. S. Tahhan, and A. A. Quyyumi, “Marital status and outcomes in patients with cardiovascular disease,” Trends Cardiovasc Med, vol. 30, no. 4, pp. 215–220, 2020,
https://doi.org/10.1016/j.tcm.2019.05.012.
[9] I. D. Mienye and N. Jere, “A Survey of Decision Trees: Concepts, Algorithms, and Applications,” IEEE Access, vol. 12, pp. 86716–86727, 2024,
doi: 10.1109/ACCESS.2024.3416838.
[10] T. Chen and C. Guestrin, “XGBoost: A Scalable Tree Boosting System,” Mar. 2016, doi: 10.1145/2939672.2939785.
[11] I. K. Nti, O. Nyarko-Boateng, J. Aning, G. K. Fosu, H. A. Pokuaa, and F. Kyeremeh, “Early Detection of Stroke for Ensuring Health and Well-Being Based on Categorical Gradient Boosting Machine,” Journal of ICT Research and Applications, vol. 16, no. 3, pp. 313–332, 2022,
doi: 10.5614/itbj.ict.res.appl.2022.16.3.8.
[12] J.-J. Beunza et al., “Comparison of machine learning algorithms for clinical event prediction (risk of coronary heart disease).,” J Biomed Inform, vol. 97, p. 103257, Sep. 2019,
doi: 10.1016/j.jbi.2019.103257.
[13] T. Vu et al., “Machine Learning Approaches for Stroke Risk Prediction: Findings from the Suita Study.,” J Cardiovasc Dev Dis, vol. 11, no. 7, Jul. 2024, doi: 10.3390/jcdd11070207.
[14] R. Shwartz-Ziv and A. Armon, “Tabular data: Deep learning is not all you need,” Information Fusion, vol. 81, pp. 84–90, 2022,
https://doi.org/10.1016/j.inffus.2021.11.011.
[15] C. Rudin, “Stop Explaining Black Box Machine Learning Models for High Stakes Decisions and Use Interpretable Models Instead.,” Nat Mach Intell, vol. 1, no. 5, pp. 206–215, May 2019, doi: 10.1038/s42256-019-0048-x.
[16] S. M. Lundberg et al., “Explainable AI for Trees: From Local Explanations to Global Understanding,” May 2019, [Online]. Available: http://arxiv.org/abs/1905.04610
[17] F. Hasnah, Y. Lestari, and A. Abdiana, “The risk of smoking with stroke in Asia : meta-analysis,” Jurnal Profesi Medika : Jurnal Kedokteran dan Kesehatan, vol. 14, no. 1, Apr. 2020, doi: 10.33533/jpm.v14i1.1597.


Authors

Daniel William
Genrawan Hoendarto
Jimmy Tjen
[1]
“Stroke Risk Factor Prediction using Gradient Boost Method”, Soc. sci. humanities j., vol. 9, no. 01, pp. 6287–6294, Jan. 2025, doi: 10.18535/sshj.v9i01.1573.