Development and validation of machine learning models predicting hospitalizations of hypertensive patients over 12 months
https://doi.org/10.15829/1728-8800-2025-4130
EDN: YXVRIN
Abstract
Aim. To develop models for predicting hospitalizations of hypertensive (HTN) over 12 months using machine learning algorithms and to validate them using real-world practice data.
Material and methods. Based on the data from depersonalized electronic health records obtained from the Webiomed platform, 1165770 records of 151492 patients with HTN were selected. After the initial selection, a total of 43 anamnestic, constitutional, clinical, and paraclinical features were used as predictors. Automatic machine learning tools were used to create the models. A wide range of algorithms was considered, including logistic regression, decision tree-based methods using gradient boosting and bagging, discriminant analysis, a neural network algorithm and a naive Bayes classifier. Data from a single region were used for external validation.
Results. The XGBoost model showed the best results, achieving an area under the ROC curve (AUC) of 0,849 (95% confidence interval: 0,825-0,873) during internal testing and 0,815 (95% confidence interval: 0,797-0,835) during external validation.
Conclusion. A new highly accurate model for predicting hospitalization of HTN patients based on real-world data was developed. The results of external validation of the final model showed relative resistance to new data from another region that in combination with quality metrics presents the possibility of its approval for application in clinical practice.
About the Authors
A. E. AndreychenkoRussian Federation
Petrozavodsk
A. D. Ermak
Russian Federation
Petrozavodsk
D. V. Gavrilov
Russian Federation
Petrozavodsk
R. E. Novitsky
Russian Federation
Petrozavodsk
O. M. Drapkina
Russian Federation
Moscow
A. V. Gusev
Russian Federation
Moscow
References
1. Kobalava ZD, Konradi AO, Nedogoda SV, et al. Arterial hypertension in adults. Clinical guidelines 2020. Russian Journal of Cardiology. 2020;25(3):3786. (In Russ.) doi:10.15829/1560-4071-2020-3-3786.
2. Gaziano TA, Bitton A, Anand S, et al. The global cost of nonoptimal blood pressure. J Hypertens. 2009;27:1472-7. doi:10.1097/HJH.0b013e32832a9ba3.
3. Wang G, Fang J, Ayala C. Hypertension-associated hospitalizations and costs in the United States, 1979-2006. Blood Pressure. 2014;23: 126-33. doi:10.3109/08037051.2013.814751.
4. Lee W, Lee J, Lee H, et al. Prediction of hypertension complications risk using classification techniques. Ind Eng Manag Syst. 2014; 13:449-53. doi:10.7232/iems.2014.13.4.449.
5. Feng Y, Leung AA, Lu X, et al. Personalized prediction of incident hospitalization for cardiovascular disease in patients with hypertension using machine learning. BMC Med Res Methodol. 2022; 22:325. doi:10.1186/s12874-022-01814-3.
6. Lee SJ, Lee SH, Choi HI, et al. Deep learning improves prediction of cardiovascular disease-related mortality and admission in patients with hypertension: analysis of the Korean National Health Information Database. J Clin Med. 2022;11:6677. doi:10.3390/jcm11226677.
7. Wu X, Yuan X, Wang W, et al. Value of a machine learning approach for predicting clinical outcomes in young patients with hypertension. Hypertension. 2020;75:1271-8. doi:10.1161/HYPERTENSIONAHA.119.13404.
8. Ren Y, Fei H, Liang X, et al. A hybrid neural network model for predicting kidney disease in hypertension patients based on electronic health records. BMC Med Inform Decis Mak. 2019;19:51. doi:10.1186/s12911-019-0765-4.
9. Park J, Kim JW, Ryu B, et al. Patient-level prediction of cardio-cerebrovascular events in hypertension using Nationwide Claims Data. J Med Intern Res. 2019;21:11757. doi:10.2196/11757.
10. Lacson RC, Baker B, Suresh H, et al. Use of machine-learning algorithms to determine features of systolic blood pressure variability that predict poor outcomes in hypertensive patients. Clin Kidney J. 2019;12:206-12. doi:10.1093/ckj/sfy049.
11. Chen R, Yang Y, Miao F, et al. 3-year risk prediction of coronary heart disease in hypertension patients: a preliminary study. 2017 39th Annual International Conference of the IEEE Engineering in Medicine and Biology Society. 2017;1182-5. doi:10.1109/EMBC.2017.8037041.
12. Moons KGM, Altman DG, Reitsma JB, et al. Transparent Reporting of a multivariable prediction model for Individual Prognosis or Diagnosis (TRIPOD): explanation and elaboration. Ann Intern Med. 2015;162:W1-73. doi:10.7326/M14-0698.
13. Andreychenko AE, Ermak AD, Gavrilov DV, et al. Development and validation of machine learning models to predict unplanned hospitalizations of patients with diabetes within the next 12 months. Diabetes mellitus. 2024;27(2):142-57. (In Russ.) doi:10.14341/DM13065.
14. Andreychenko AE, Luchinin AS, Ivshin AA, et al. Development and validation of models to predict total and early-onset preeclampsia in the first trimester of pregnancy using machine learning algorithms. Akusherstvo i Ginekologiya. 2023;2:94-107. (In Russ.) doi:10.18565/aig.2023.101.
15. Sokolova M, Lapalme G. A systematic analysis of performance measures for classification tasks. Information Processing & Management. 2009;45:427-37. doi:10.1016/j.ipm.2009.03.002.
16. Zoubir AM, Iskander DR. Bootstrap Methods and Applications: A Tutorial for the Signal Processing Practitioner. IEEE Signal Processing Magazine. 2007;24:10-9. doi:10.1109/MSP.2007.4286560.
17. Ding Y, Simonoff JS. An investigation of missing data methods for classification trees applied to binary response data. J Mach Learn Res. 2010;11:131-70. doi:10.1145/1756006.1756012.
18. Cao XH, Stojkovic I, Obradovic Z. A robust data scaling algorithm to improve classification accuracies in biomedical data. BMC Bioinformatics. 2016;17. doi:10.1186/s12859-016-1236-x.
19. Amorim LB, Cavalcanti GD, Cruz RM. The choice of scaling technique matters for classification performance. Appl Soft Comput. 2023;133. doi:10.1016/j.asoc.2022.109924.
20. Weiss GM. Foundations of Imbalanced Learning. In: Haibo H, Yunqian M. Imbalanced Learning: Foundations, Algorithms, and Applications. USA: John Wiley & Sons. 2013:13-41. ISBN: 9781118074626.
21. Gain U, Hotti V. Low-code AutoML-augmented data pipeline — a review and experiments. JPCS. 2021;1828. doi:10.1088/1742-6596/1828/1/012015.
22. Bergstra J, Bengio Y. Random search for hyper-parameter optimization. J Mach Learn Res. 2012;13:281-305.
23. Lundberg SM, Erion G, Chen H, et al. From local explanations to global understanding with explainable AI for trees. Nat Mach Intell. 2020;2:56-67. doi:10.1038/s42256-019-0138-9.
24. Fischer BG, Evans AT. SpPin and SnNout are not enough. It’s time to fully embrace likelihood ratios and probabilistic reasoning to achieve diagnostic excellence. J Gen Inter Med. 2023;38:2202-4. doi:10.1007/s11606-023-08177-5.
Supplementary files
What is already known about the subject?
- Machine learning methods have proven effectiveness in developing predictive tools for determining outcomes of various multifactorial diseases.
- Predicting the progression of hypertension, along with non-elective hospitalization risk for patients with this condition, and implementing timely interventions in their management are crucial for the healthcare system as a whole and for preventing complications in individual patients.
What might this study add?
- A data set was formed, including records of more than 150 thousand patients with hypertension.
- Using generally accepted technologies, based on various machine learning algorithms, a number of predictive models were developed to predict non-elective hospitalizations of these patients.
- The XGBoost-based model showed the best accuracy metrics and stability on external data.
Review
For citations:
Andreychenko A.E., Ermak A.D., Gavrilov D.V., Novitsky R.E., Drapkina O.M., Gusev A.V. Development and validation of machine learning models predicting hospitalizations of hypertensive patients over 12 months. Cardiovascular Therapy and Prevention. 2025;24(1):4130. (In Russ.) https://doi.org/10.15829/1728-8800-2025-4130. EDN: YXVRIN