Preview

Cardiovascular Therapy and Prevention

Advanced search

Development and validation of machine learning models predicting hospitalizations of hypertensive patients over 12 months

https://doi.org/10.15829/1728-8800-2025-4130

EDN: YXVRIN

Abstract

AimTo develop models for predicting hospitalizations of hypertensive (HTN) over 12 months using machine learning algorithms and to validate them using real-world practice data.

Material and methodsBased on the data from depersonalized electronic health records obtained from the Webiomed platform, 1165770 records of 151492 patients with HTN were selected. After the initial selection, a total of 43 anamnestic, constitutional, clinical, and paraclinical features were used as predictors. Automatic machine learning tools were used to create the models. A wide range of algorithms was considered, including logistic regression, decision tree-based methods using gradient boosting and bagging, discriminant analysis, a neural network algorithm and a naive Bayes classifier. Data from a single region were used for external validation.

ResultsThe XGBoost model showed the best results, achieving an area under the ROC curve (AUC) of 0,849 (95% confidence interval: 0,825-0,873) during internal testing and 0,815 (95% confidence interval: 0,797-0,835) during external validation.

ConclusionA new highly accurate model for predicting hospitaliza­tion of HTN patients based on real-world data was developed. The results of external validation of the final model showed relative re­sistance to new data from another region that in combination with quality metrics presents the possibility of its approval for application in clinical practice.

About the Authors

A. E. Andreychenko
OOO K-Sky
Russian Federation

Petrozavodsk



A. D. Ermak
OOO K-Sky
Russian Federation

Petrozavodsk



D. V. Gavrilov
OOO K-Sky
Russian Federation

Petrozavodsk



R. E. Novitsky
OOO K-Sky
Russian Federation

Petrozavodsk



O. M. Drapkina
National Medical Research Center for Therapy and Preventive Medicine
Russian Federation

Moscow



A. V. Gusev
Central Research Institute for Health Organization and Informatics; Research and Practical Clinical Center for Diagnostics and Telemedicine Technologies
Russian Federation

Moscow



References

1. Kobalava ZD, Konradi AO, Nedogoda SV, et al. Arterial hypertension in adults. Clinical guidelines 2020. Russian Journal of Cardiology. 2020;25(3):3786. (In Russ.) doi:10.15829/1560-4071-2020-3-3786.

2. Gaziano TA, Bitton A, Anand S, et al. The global cost of nonoptimal blood pressure. J Hypertens. 2009;27:1472-7. doi:10.1097/HJH.0b013e32832a9ba3.

3. Wang G, Fang J, Ayala C. Hypertension-­associated hospitalizations and costs in the United States, 1979-2006. Blood Pressure. 2014;23: 126-33. doi:10.3109/08037051.2013.814751.

4. Lee W, Lee J, Lee H, et al. Prediction of hypertension complications risk using classification techniques. Ind Eng Manag Syst. 2014; 13:449-53. doi:10.7232/iems.2014.13.4.449.

5. Feng Y, Leung AA, Lu X, et al. Personalized prediction of incident hospitalization for cardiovascular disease in patients with hyper­tension using machine learning. BMC Med Res Methodol. 2022; 22:325. doi:10.1186/s12874-022-01814-3.

6. Lee SJ, Lee SH, Choi HI, et al. Deep learning improves prediction of cardiovascular disease-­related mortality and admission in pa­tients with hypertension: analysis of the Korean National Health Information Database. J Clin Med. 2022;11:6677. doi:10.3390/jcm11226677.

7. Wu X, Yuan X, Wang W, et al. Value of a machine learning ap­proach for predicting clinical outcomes in young patients with hyper­tension. Hypertension. 2020;75:1271-8. doi:10.1161/HYPERTENSIONAHA.119.13404.

8. Ren Y, Fei H, Liang X, et al. A hybrid neural network model for predicting kidney disease in hypertension patients based on elect­ronic health records. BMC Med Inform Decis Mak. 2019;19:51. doi:10.1186/s12911-019-0765-4.

9. Park J, Kim JW, Ryu B, et al. Patient-­level prediction of cardio-­cerebrovascular events in hypertension using Nationwide Claims Data. J Med Intern Res. 2019;21:11757. doi:10.2196/11757.

10. Lacson RC, Baker B, Suresh H, et al. Use of machine-­learning algorithms to determine features of systolic blood pressure variability that predict poor outcomes in hypertensive patients. Clin Kidney J. 2019;12:206-12. doi:10.1093/ckj/sfy049.

11. Chen R, Yang Y, Miao F, et al. 3-year risk prediction of coronary heart disease in hypertension patients: a preliminary study. 2017 39th Annual International Conference of the IEEE Engineering in Medicine and Biology Society. 2017;1182-5. doi:10.1109/EMBC.2017.8037041.

12. Moons KGM, Altman DG, Reitsma JB, et al. Transparent Reporting of a multivariable prediction model for Individual Prognosis or Diagnosis (TRIPOD): explanation and elaboration. Ann Intern Med. 2015;162:W1-73. doi:10.7326/M14-0698.

13. Andreychenko AE, Ermak AD, Gavrilov DV, et al. Development and validation of machine learning models to predict unplanned hospi­ta­lizations of patients with diabetes within the next 12 months. Dia­betes mellitus. 2024;27(2):142-57. (In Russ.) doi:10.14341/DM13065.

14. Andreychenko AE, Luchinin AS, Ivshin AA, et al. Development and va­lidation of models to predict total and early-­onset preeclampsia in the first trimester of pregnancy using machine learning algo­rithms. Akusherstvo i Ginekologiya. 2023;2:94-107. (In Russ.) doi:10.18565/aig.2023.101.

15. Sokolova M, Lapalme G. A systematic analysis of performance measures for classification tasks. Information Processing & Mana­gement. 2009;45:427-37. doi:10.1016/j.ipm.2009.03.002.

16. Zoubir AM, Iskander DR. Bootstrap Methods and Applications: A Tu­torial for the Signal Processing Practitioner. IEEE Signal Processing Magazine. 2007;24:10-9. doi:10.1109/MSP.2007.4286560.

17. Ding Y, Simonoff JS. An investigation of missing data methods for classification trees applied to binary response data. J Mach Learn Res. 2010;11:131-70. doi:10.1145/1756006.1756012.

18. Cao XH, Stojkovic I, Obradovic Z. A robust data scaling algorithm to improve classification accuracies in biomedical data. BMC Bioinformatics. 2016;17. doi:10.1186/s12859-016-1236-x.

19. Amorim LB, Cavalcanti GD, Cruz RM. The choice of scaling tech­nique matters for classification performance. Appl Soft Comput. 2023;133. doi:10.1016/j.asoc.2022.109924.

20. Weiss GM. Foundations of Imbalanced Learning. In: Haibo H, Yunqian M. Imbalanced Learning: Foundations, Algorithms, and Ap­pli­cations. USA: John Wiley & Sons. 2013:13-41. ISBN: 9781118074626.

21. Gain U, Hotti V. Low-code AutoML-augmented data pipeline — a review and experiments. JPCS. 2021;1828. doi:10.1088/1742-6596/1828/1/012015.

22. Bergstra J, Bengio Y. Random search for hyper-­parameter opti­mization. J Mach Learn Res. 2012;13:281-305.

23. Lundberg SM, Erion G, Chen H, et al. From local explanations to global understanding with explainable AI for trees. Nat Mach Intell. 2020;2:56-67. doi:10.1038/s42256-019-0138-9.

24. Fischer BG, Evans AT. SpPin and SnNout are not enough. It’s time to fully embrace likelihood ratios and probabilistic reasoning to achieve diagnostic excellence. J Gen Inter Med. 2023;38:2202-4. doi:10.1007/s11606-023-08177-5.


Supplementary files

What is already known about the subject?

  • Machine learning methods have proven effectiveness in developing predictive tools for determining outcomes of various multifactorial diseases.
  • Predicting the progression of hypertension, along with non-elective hospitalization risk for patients with this condition, and implementing timely interventions in their management are crucial for the healthcare system as a whole and for preventing complications in individual patients.

What might this study add?

  • A data set was formed, including records of more than 150 thousand patients with hypertension.
  • Using generally accepted technologies, based on various machine learning algorithms, a number of predictive models were developed to predict non-elective hospitalizations of these patients.
  • The XGBoost-based model showed the best accuracy metrics and stability on external data.

Review

For citations:


Andreychenko A.E., Ermak A.D., Gavrilov D.V., Novitsky R.E., Drapkina O.M., Gusev A.V. Development and validation of machine learning models predicting hospitalizations of hypertensive patients over 12 months. Cardiovascular Therapy and Prevention. 2025;24(1):4130. (In Russ.) https://doi.org/10.15829/1728-8800-2025-4130. EDN: YXVRIN

Views: 156


Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.


ISSN 1728-8800 (Print)
ISSN 2619-0125 (Online)