Scroll to:
Improving data processing in medical education through machine learning
https://doi.org/10.15829/1728-8800-2025-4446
EDN: OJETWM
Abstract
The exponential growth of biomedical data coupled with advances in machine learning (ML) has created opportunities for more precise diagnosis, enhanced treatment planning, and improved patient management. However, the successful implementation of ML in clinical settings depends on healthcare professionals’ understanding and competency in these technologies. This study examines the effectiveness of integrating ML methodologies into the curricula of Astana Medical University and S. D. Asfendiyarov Kazakh National Medical University. Focusing on childhood allergic conditions such as asthma, rhinitis, and skin diseases, a supervised ML approach (linear regression) was employed to analyze both clinical and educational data. Results showed that the experimental group of students who received ML-integrated training demonstrated significant improvements in analytical competence and data processing accuracy compared to the control group. The ML model achieved a coefficient of determination (R2) of 0,85 with low prediction errors (MAE=0,45, MSE=0,30, RMSE=0,55). Statistical tests supported the hypothesis that structured ML education enhances medical students’ competencies, suggesting that future healthcare professionals trained in ML can better leverage data-driven decision-making for improved patient care. This study contributes to the growing body of literature advocating for ML integration in medical education and underscores the need for further research into advanced ML algorithms and long-term clinical outcomes.
For citations:
Shyndaliyev N., Orynbayeva A., Shadinova K., Barakova A., Nurmukhanbetova N. Improving data processing in medical education through machine learning. Cardiovascular Therapy and Prevention. 2025;24(2S):4446. https://doi.org/10.15829/1728-8800-2025-4446. EDN: OJETWM
Introduction
Machine learning (ML) has become an essential tool for analyzing complex biomedical and clinical data [1]. Advances in information technologies and the growing volume of multi-omics, imaging, and electronic health records have reshaped medical practice, enabling data-driven diagnostics, treatment planning, and patient management [2][3]. The increasing interest in artificial intelligence (AI) in healthcare is evidenced by a tenfold rise in publications since 2012 [2].
Traditional analytical methods are often inadequate for managing the complexity and scale of modern medical datasets. ML offers advantages in tasks such as disease diagnosis, medical imaging, drug development, clinical decision support, and telemedicine [4]. In many cases, its predictive capabilities now exceed those of human experts.
Yet, the successful application of ML in clinical practice requires proper oversight and interpretation by trained professionals [5]. As ML becomes integral to healthcare, equipping future clinicians with foundational ML knowledge is imperative.
To support this educational need, a manual titled Possibilities of Using Machine Learning Algorithms in Medical Data Processing was developed, covering key ML concepts, Python-based practical tasks, and algorithm applications in clinical settings [6]. This study builds on that foundation to evaluate the integration of ML into medical curricula.
Using real clinical and educational data, we assess the effectiveness of ML-based instruction at Astana Medical University and S. D. Asfendiyarov Kazakh National Medical University. Our goal is to enhance students’ analytical skills, improve diagnostic reasoning, and promote data-informed decision-making in future clinical practice.
Literature review
The application of ML in medicine has been extensively examined by both international and local researchers. Gui C. and Chan V. [7] demonstrated ML’s utility in prediction, diagnosis, and improving access to medical care, while Habehh H. and Gohel S. [8] reviewed core ML paradigms — supervised, unsupervised, and reinforcement learning — highlighting applications in radiology, genomics, electronic health records, and neuroimaging.
Dhillon A. and Singh A. [9] summarized various ML algorithms used to analyze healthcare data such as clinical records, omics profiles, and sensor inputs. Ghassemi M, et al. [10] emphasized clinical opportunities, whereas Nayyar A, et al. [11] addressed challenges including data limitations, interpretability, and integration into clinical workflows. Jia Z, et al. [2] and Bi WL, et al. [12] stressed the need for standardized evaluation metrics and robust preprocessing to ensure reliable ML performance and broader educational adoption.
Magoulas G. D. and Prentza A. [13] underscored the importance of visualization, interpretable algorithms, and noise-resilient models for practical use. Sendak MP, et al. [14] evaluated 21 real-world ML implementations, detailing their development and maintenance challenges, thereby offering insights into how ML can be effectively scaled and monitored in healthcare.
Despite rapid advances, literature reveals several gaps. Most studies concentrate on narrow tasks, lack diverse datasets, or overlook integration into educational curricula. Many ML applications remain underutilized due to limited faculty expertise, inconsistent teaching materials, and a lack of empirical evaluation of student learning outcomes.
This study addresses these issues by expanding ML applications to a broader range of pediatric allergy datasets and by integrating ML modules into medical education. It proposes methods to enhance model resilience and instructional strategies, aiming to improve students’ competencies in analyzing and interpreting complex clinical data. Ultimately, the study contributes to evidence-based strategies for incorporating ML into medical curricula, fostering data-literate clinicians prepared for modern healthcare environments.
Methodology
This study followed a three-stage approach: (1) review of pedagogical and clinical literature on ML integration in medicine; (2) selection of practical, evidence-based ML methods; and (3) implementation and analysis of an educational intervention. The process informed the design of a training module for medical students and guided the use of ML for analyzing pediatric allergy data [15][16].
ML Model Selection
We applied supervised machine learning — specifically linear regression — due to its simplicity, interpretability, and wide use in clinical analytics [17-19]. The goal was to predict allergic disease outcomes using labeled clinical and demographic data. Emphasis was placed on data quality, feature selection, and model validation.
ML Workflow
The ML pipeline included:
- Problem Definition: Predicting outcomes of childhood allergies.
- Data Collection: Clinical and demographic data related to asthma, rhinitis, dermatitis, and food allergies.
- Preprocessing: Handling missing values, normalization, and encoding.
- Feature Engineering: Selection based on statistical relevance.
- Model Training and Evaluation: Using MAE, MSE, RMSE, and R² as performance metrics.
- Interpretation and Validation: Focused on explainability and accuracy for educational relevance.
Advanced model deployment and maintenance were not considered in this educational context. The methods emphasized practicality and reproducibility over complexity [20].
Allergy Domain Relevance
Although over 500 AI/ML applications are FDA-approved (mostly in radiology and cardiology), none are registered in allergy and immunology as of 2023 [14][21]. This study aims to close that gap by focusing on pediatric allergy prediction, a domain where early intervention can significantly impact outcomes [12].
Dataset
The dataset included patient-level variables such as age, gender, ethnicity, insurance type, allergy onset/end, comorbidities (asthma, dermatitis, rhinitis), and treatment history. Table 1 outlines the complete parameter structure. The dataset enabled training and evaluation of ML models that account for demographic and clinical variability without relying on synthetic augmentation [22-24].
Table 1
Key Parameters for Analyzing Patient Allergies and Asthma Profiles
Category | Parameter | Description |
Demographics | Year of Birth | For calculating age |
Gender | Analyze allergy prevalence by gender | |
Ethnicity | Assess ethnic variations in allergy susceptibility | |
Insurance Type | Indicates healthcare access either private/public/uninsured | |
Patient Cohort | Used for segmentation of patients based on group/time | |
Age at Onset of Allergy | For studying allergy development | |
Age at End of Study | For calculating the duration of allergy | |
Food Allergies | Shellfish Allergy, Milk Allergy, Soy Allergy, Egg Allergy, Wheat Allergy, Peanut Allergy, Sesame Allergy | Presence or absence at study start and end |
Nut Allergies | Overall Status, Tree Nut Allergy, Pecan Allergy, Pistachio Allergy, Almond Allergy, Brazil Nut Allergy | Presence or absence at study start and end |
Comorbidities | Atopic Dermatitis, Allergic Rhinitis, Asthma | Presence or absence at study start and end |
Asthma Medications | First, Last and Total | First asthma medication prescribed to the patient; most recent asthma medication prescribed to the patient and total asthma medications prescribed throughout the study |
Participants and Grouping
Participants were 156 third-year students from pediatrics, medicine, dentistry, and pharmacy programs at Astana Medical University and S. D. Asfendiyarov Kazakh National Medical University. Students were assigned to experimental (n=80) or control (n=76) groups based on schedule compatibility. Those with prior ML experience were excluded. Table 2 details group distribution by program and university.
Table 2
Participants Distribution
№ | University | Education Program | Number of Students | Experimental Group | Control Group |
1 | Astana Medical University | 6В10125-Pediatrics | 23 | 17 | 6 |
6В10123-Medicine | 14 | 10 | 4 | ||
6В10124-Dentistry | 42 | 30 | 12 | ||
6В10104-Pharmacy | 20 | 20 | 0 | ||
2 | S. D. Asfendiyarov Kazakh National Medical University | 6В10116-Medicine | 24 | 5 | 19 |
6В10117-Pediatrics | 26 | 5 | 21 | ||
6В10118-Dentistry | 30 | 10 | 20 | ||
Total | 156 | 80 | 76 |
Educational Intervention
The experimental group underwent a 4-week ML training module integrated into their pediatrics coursework. Modules included:
- Week 1: Introduction to supervised/unsupervised learning and AI ethics (2 hours);
- Week 2: Data preprocessing using Python (4 hours);
- Week 3: Linear regression and error metrics (4 hours);
- Week 4: Model interpretation and clinical case discussions (2 hours).
Assignments and a final mini-project supported practical engagement. Sessions were guided by instructors with ML expertise.
Results
Visualizing Allergy Prevalence and Patterns
Figure 1 provides an analysis of allergy case numbers, revealing the relative prevalence of conditions such as asthma, skin allergies, and respiratory issues. As shown in Figure 1, the most common allergies are skin allergies, respiratory issues, and asthma. Identifying these patterns is critical for guiding early preventive measures and informing clinical resource allocation, especially in contexts where ML-driven insights can enhance diagnostic accuracy and healthcare management strategies [1][2].
Figure 1 Number of Allergy Cases by Type.
Insights into Allergy Onset and Severity
Figure 2 illustrated that certain skin allergies could be detected from birth, implicating strong genetic or maternal influences. Such early onset underlines the importance of initiating preventive strategies and dietary/environmental controls promptly to mitigate long-term severity.
Figure 2 Age at onset of Skin Allergies.
Similarly, Figure 3 highlighted that non-contact allergies present at the start of the study were associated with reduced exacerbations following medicinal treatment for atopic dermatitis. These findings resonate with research indicating that the interplay of genetic predispositions and environmental exposures dictates the trajectory of allergic diseases [3][4]. By leveraging ML models to handle the complexity of these interactions, clinicians and educators can develop more tailored interventions, potentially improving patient outcomes. Analysis of the data presented in Figure 3 indicates that the presence of non-contact allergies in children, irrespective of gender and measured at the start of the study, reduces the worsening of allergic symptoms following medicinal treatment for atopic dermatitis. Pediatric allergies are primarily the result of an interplay between genetic predispositions and environmental exposures, both of which significantly influence the onset and progression of allergic conditions. While genetic factors contribute to inherent susceptibility, environmental factors such as allergens, irritants, and infections shape a child’s sensitivity to dietary and inhaled allergens.
Figure 3 Status of atopic dermatitis at the start of the study.
Advances in understanding the genetic and environmental mechanisms that influence the immature immune system are expected to lead to the development of long-term preventive strategies. At present, the most effective method for managing children at high risk of developing allergies is the reduction of sensitization through early dietary and environmental interventions. Such measures facilitate timely identification and appropriate management of emerging allergic symptoms. Since inflammation is a common underlying factor in all allergic conditions, it is critical to initiate anti-inflammatory treatment at the earliest indication of persistent symptoms.
Model Performance and Predictive Accuracy
A supervised ML approach, specifically linear regression, was employed to predict allergic outcomes based on the dataset’s parameters. After appropriate data preprocessing and feature engineering, the model’s performance was evaluated. As shown in Table 3, the model achieved a Coefficient of Determination (R²) of 0,85, indicating that 85% of the variance in the dependent variable was explained by the independent variables. The Mean Absolute Error (MAE=0,45), Mean Squared Error (MSE=0,30), and Root Mean Squared Error (RMSE=0,55) further underscored the model’s accuracy and reliability.
Table 3
Performance Indicators of the Machine Learning Model
Metric | Value |
R² | 0,85 |
MAE | 0,45 |
MSE | 0,30 |
RMSE | 0,55 |
These robust performance metrics align with emerging evidence that ML algorithms can outperform traditional statistical methods in diagnosing and predicting complex clinical outcomes [1-4]. The high R² value and relatively low errors validate the chosen modeling approach and confirm that ML can effectively capture the intricate factors contributing to allergic conditions.
Educational Intervention and Experimental Outcomes
In addition to modeling allergic diseases, the study assessed whether integrating ML training into medical curricula could enhance students’ analytical capabilities. Figure 4 depicts the baseline results for the control group. Figure 5 depicts the baseline results for the experimental group. The values are in percentage. Comparison between Figure 4 and Figure 5 indicates no significant initial differences in ML understanding between the two groups.
Figure 4 Baseline experiment results for the control group.
Figure 5 Baseline experiment results for the experimental group.
After the introduction of ML modules into the curriculum, the formative experiment results were calculated. Figure 6 depicts the formative experiment results for the control group.
Figure 6. Formative results for the control group.
Figure 7 depicts the formative experiment results for the experimental group. Comparison between figures 6 and 7 revealed that the experimental group exhibited substantial improvements in ML knowledge and data processing proficiency. However, the control group’s performance remained relatively stable.
Figure 7. Formative results for the experimental group.
Statistical Validation of Hypotheses
To quantify the educational impact, Chi-square (χ²) tests were conducted. The calculated values χ²(motivational) =18,81, χ²(substantive) =16,11, and χ²(organizational) =14,67 each surpassed the critical value at p<0,05. These results supported the alternative hypothesis, confirming that implementing scientific, theoretical, and practical ML foundations in the educational process improves students’ ML competencies, analytical reasoning, and data-driven decision-making skills. Overall, the Results section demonstrates the efficacy of ML models in analyzing complex pediatric allergy data and substantiates the value of integrating ML training into medical education. This dual accomplishment sets the stage for the subsequent Discussion section, where the implications of these findings for clinical practice, educational strategies, and future research directions will be explored.
Discussion
This study demonstrates two central findings: (1) supervised ML, specifically linear regression, effectively models complex clinical data related to childhood allergic conditions; and (2) structured ML training significantly improves medical students’ analytical skills.
Clinically, the ML model achieved strong predictive performance, with an R² of 0,85 and low error metrics (MAE, MSE, RMSE), indicating the model’s ability to capture the multifactorial nature of pediatric allergy outcomes. These findings are consistent with prior studies showing that ML outperforms traditional statistical approaches in analyzing large, heterogeneous medical datasets. As such, ML tools can support more precise diagnostics, prognosis, and individualized treatment strategies.
Educationally, the intervention enhanced students’ understanding of key ML concepts, from data preprocessing to model interpretation. The experimental group showed measurable gains over the control group, confirming the value of integrating ML into medical curricula. These results suggest that early exposure to data-driven reasoning equips future clinicians with essential competencies for practicing in increasingly technology-driven healthcare environments.
Importantly, ML is not a replacement for clinical expertise but a complement to it. Trained professionals are needed to interpret algorithmic outputs, assess model reliability, and apply insights ethically in patient care. Embedding ML literacy into training programs fosters responsible use of emerging technologies and prepares students for interdisciplinary collaboration in AI-augmented settings.
From a clinical perspective, the model’s ability to highlight early onset patterns and treatment responses in atopic dermatitis illustrates how ML can guide timely interventions. These insights reinforce the potential for ML-driven tools to support preventive strategies in allergy management.
Limitations include the study’s quasi-experimental design and the use of a relatively basic algorithm. Future work should explore more advanced models (e.g., decision trees, ensembles, neural networks) and assess long-term educational outcomes, including clinical application and patient impact. Longitudinal studies could also evaluate skill retention and real-world integration of ML competencies into clinical workflows.
Overall, this study supports the feasibility and value of incorporating ML into both medical education and pediatric allergy analysis. It lays a foundation for broader adoption of ML training in health curricula and further research into scalable, ethically grounded AI applications in medicine.
Conclusion
This study demonstrated that incorporating ML methodologies into medical education and clinical data analysis can yield substantial benefits. On the clinical side, a supervised ML approach, specifically linear regression, accurately modeled complex pediatric allergy data, achieving an R² of 0,85 and relatively low prediction errors (MAE, MSE, RMSE). These results confirm that ML tools are capable of capturing the nuanced interplay of genetic and environmental factors that shape allergy onset, severity, and progression, aligning with literature that acknowledges ML’s capacity to outpace traditional analytical techniques in healthcare.
Equally important, the educational intervention showed that providing medical students with structured ML training significantly improved their ability to preprocess data, select and interpret ML models, and make informed, data-driven decisions. Such proficiency will be indispensable in an era of healthcare increasingly defined by advanced analytics, where clinicians must understand and critically evaluate algorithmic outputs to ensure patient safety and optimal treatment outcomes. The observed enhancements in student competencies suggest that integrating ML concepts into medical curricula can fill existing educational gaps, better preparing future healthcare professionals for the realities of a data-intensive clinical environment.
While the results are encouraging, the study’s limitations — such as the non-randomized assignment of participants and reliance on linear regression — highlight areas for future research. Expanding investigations to more sophisticated ML algorithms, implementing randomized controlled trials, and conducting longitudinal follow-ups could provide deeper insights into long-term skill retention and the real-world impact of ML-trained clinicians on patient care and healthcare efficiency.
In essence, this research contributes evidence that ML-driven approaches can simultaneously improve the quality of medical data analysis and the caliber of medical education. By validating ML’s effectiveness in handling complex clinical datasets and showing that structured ML instruction enhances students’ analytical capabilities, this work lays a foundation for ongoing developments. As ML techniques continue to evolve, so too must educational strategies and clinical guidelines, ensuring that these powerful tools are leveraged ethically, responsibly, and effectively to advance patient care and health outcomes.
Relationships and Activities: none.
References
1. Cruz JA, Wishart DS. Applications of machine learning in cancer prediction and prognosis. Cancer Inform. 2007;2:59-77.
2. Jia Z, Chen J, Xu X, et al. The importance of resource awareness in artificial intelligence for healthcare. Nat Mach Intell. 2023; 5(7):687-98.
3. Cunningham P, Cord M, Delany SJ. Supervised Learning. In: Cord M, Cunningham P, editors. Machine Learning Techniques for Multimedia: Case Studies on Organization and Retrieval [Internet]. Berlin, Heidelberg: Springer; 2008 p. 21-49. doi:10.1007/978-3-540-75171-7_2.
4. Naik N, Rallapalli Y, Krishna M, et al. Demystifying the Advancements of Big Data Analytics in Medical Diagnosis: An Overview. Eng Sci. 2021;19(2):42-58.
5. Scott I, Carter S, Coiera E. Clinician checklist for assessing suitability of machine learning applications in healthcare: BMJ Health & Care Informatics 2021;28:e100251. doi:10.1136/bmjhci-2020-100251.
6. Orynbaeva AS, Shindaliyev NT, Abdikadyr ZN. Possibilities of Using Machine Learning Algorithms in Medical Data Processing: Manual for Students. Astana Aktaulova’s LLP; 2024. 190 p.
7. Gui C, Chan V. Machine learning in medicine. Univ West Ont Med J. 2017;86(2):76-8.
8. Habehh H, Gohel S. Machine Learning in Healthcare. Curr Genomics. 2021;22(4):291-300.
9. Dhillon A, Singh A. Machine learning in healthcare data analysis: a survey. J Biol Today’s World. 2019;8(6):1-10.
10. Ghassemi M, Naumann T, Schulam P, et al. A Review of Challenges and Opportunities in Machine Learning for Health. AMIA Summits Transl Sci Proc. 2020;2020:191-200.
11. Nayyar A, Gadhavi L, Zaman N. Chapter 2 — Machine learning in healthcare: review, opportunities and challenges. In: Machine learning in healthcare: review, opportunities and challenges. 2021;23-45. doi:10.1016/B978-0-12-821229-5.00011-2.
12. Bi WL, Hosny A, Schabath MB, et al. Artificial intelligence in cancer imaging: Clinical challenges and applications. CA Cancer J Clin. 2019;69(2):127-57.
13. Magoulas GD, Prentza A. Machine learning in medical applications. In: Advanced course on artificial intelligence. Springer; 1999. p. 300-7.
14. Sendak MP, D’Arcy J, Kashyap S, et al. A Path for Translation of Machine Learning Products into Healthcare Delivery. EMJ Innov. 2020. doi:10.33590/emjinnov/19-00172.
15. Deo RC. Machine Learning in Medicine. Circulation. 2015;132(20): 1920-30.
16. Zamzam AH, Abdul Wahab AK, Azizan MM, et al. A Systematic Review of Medical Equipment Reliability Assessment in Improving the Quality of Healthcare Services. Front Public Health. 2021;9:753951.
17. Cajal B, Jiménez R, Gervilla E, Montaño JJ. Doing a Systematic Review in Health Sciences. Clin Health. 2020;31(2):77-83.
18. Goodacre R, Broadhurst D, Smilde AK, et al. Proposed minimum reporting standards for data analysis in metabolomics. Metabolomics. 2007;3(3):231-41.
19. Montgomery DC, Peck EA, Vining GG. Introduction to Linear Regression Analysis. 5th edition. Hoboken, NJ: John Wiley & Sons Inc; 2012. 645 p.
20. Rajkomar A, Dean J, Kohane I. Machine Learning in Medicine. N Engl J Med. 2019;380(14):1347-58.
21. van Breugel M, Fehrmann RSN, Bügel M, et al. Current state and prospects of artificial intelligence in allergy. Allergy. 2023; 78(10):2623-43.
22. Khan M, Banerjee S, Muskawad S, et al. The Impact of Artificial Intelligence on Allergy Diagnosis and Treatment. Curr Allergy Asthma Rep. 2024;24(7):361-72.
23. Breiteneder H, Diamant Z, Eiwegger T, et al. Future research trends in understanding the mechanisms underlying allergic diseases for improved patient care. Allergy. 2019;74(12):2293-311.
24. Rabe KF, Adachi M, Lai CKW, et al. Worldwide severity and control of asthma in children and adults: The global asthma insights and reality surveys. J Allergy Clin Immunol. 2004;114(1):40-7.
About the Authors
N. ShyndaliyevKazakhstan
Shyndaliyev Nurzhan — Candidate of Pedagogic Sciences, of teacher of physics and computer science
Astana
A. Orynbayeva
Kazakhstan
Orynbayeva Ainur — senior lecturer at the Department of Biostatistics, Bioinformatics and Information Technology at Astana Medical University, doctoral student, graduated master’s degree from the Kazakh University of Economics, Finance and International Trade, majoring in Information Systems
Astana
K. Shadinova
Kazakhstan
Shadinova Kunsulu — Associate Professor in Pedagogy, at the Department of Information and Communication Technologies, Asfendiyarov Kazakh National Medical University
Almaty
A. Barakova
Kazakhstan
Barakova Aliya — senior lecturer at the Department of Engineering Disciplines and Good Practices at the Asfendirov National Medical University, master's degree in Computer Science
Almaty
N. Nurmukhanbetova
Kazakhstan
Nurmukhanbetova Nurgul — Candidate of Chemical Sciences, associate professor at the Department of Chemistry and Biotechnology of Kokshetau Sh. Ualikhanov University
Kokshetau
Review
For citations:
Shyndaliyev N., Orynbayeva A., Shadinova K., Barakova A., Nurmukhanbetova N. Improving data processing in medical education through machine learning. Cardiovascular Therapy and Prevention. 2025;24(2S):4446. https://doi.org/10.15829/1728-8800-2025-4446. EDN: OJETWM