Improving data processing in medical education through machine learning

N. Shyndaliyev; A. Orynbayeva; K. Shadinova; A. Barakova; N. Nurmukhanbetova

doi:10.15829/1728-8800-2025-4446

Improving data processing in medical education through machine learning

N. Shyndaliyev, A. Orynbayeva, K. Shadinova, A. Barakova, N. Nurmukhanbetova

https://doi.org/10.15829/1728-8800-2025-4446

EDN: OJETWM

Full Text:

PDF (Eng) HTML XML

Generate QR code

Contents

Scroll to:

Abstract

The exponential growth of biomedical data coupled with advances in machine learning (ML) has created opportunities for more precise diagnosis, enhanced treatment planning, and improved patient management. However, the successful implementation of ML in clinical settings depends on healthcare professionals’ understanding and competency in these technologies. This study examines the effectiveness of integrating ML methodologies into the curricula of Astana Medical University and S. D. Asfendiyarov Kazakh National Medical University. Focusing on childhood allergic conditions such as asthma, rhinitis, and skin diseases, a supervised ML approach (linear regression) was employed to analyze both clinical and educational data. Results showed that the experimental group of students who received ML-integrated training demonstrated significant improvements in analytical competence and data processing accuracy compared to the control group. The ML model achieved a coefficient of determination (R2) of 0,85 with low prediction errors (MAE=0,45, MSE=0,30, RMSE=0,55). Statistical tests supported the hypothesis that structured ML education enhances medical students’ competencies, suggesting that future healthcare professionals trained in ML can better leverage data-driven decision-making for improved patient care. This study contributes to the growing body of literature advocating for ML integration in medical education and underscores the need for further research into advanced ML algorithms and long-term clinical outcomes.

Keywords

machine learning, medical education, data processing, allergies, supervised learning

For citations:

Shyndaliyev N., Orynbayeva A., Shadinova K., Barakova A., Nurmukhanbetova N. Improving data processing in medical education through machine learning. Cardiovascular Therapy and Prevention. 2025;24(2S):4446. https://doi.org/10.15829/1728-8800-2025-4446. EDN: OJETWM

Introduction

Machine learning (ML) has become an essential tool for analyzing complex biomedical and clinical data [1]. Advances in information technologies and the growing volume of multi-omics, imaging, and electronic health records have reshaped medical practice, enabling data-driven diagnostics, treatment planning, and patient management [2][3]. The increasing interest in artificial intelligence (AI) in healthcare is evidenced by a tenfold rise in publications since 2012 [2].

Traditional analytical methods are often inadequate for managing the complexity and scale of modern medical datasets. ML offers advantages in tasks such as disease diagnosis, medical imaging, drug development, clinical decision support, and telemedicine [4]. In many cases, its predictive capabilities now exceed those of human experts.

Yet, the successful application of ML in clinical practice requires proper oversight and interpretation by trained professionals [5]. As ML becomes integral to healthcare, equipping future clinicians with foundational ML knowledge is imperative.

To support this educational need, a manual titled Possibilities of Using Machine Learning Algorithms in Medical Data Processing was developed, covering key ML concepts, Python-based practical tasks, and algorithm applications in clinical settings [6]. This study builds on that foundation to evaluate the integration of ML into medical curricula.

Using real clinical and educational data, we assess the effectiveness of ML-based instruction at Astana Medical University and S. D. Asfendiyarov Kazakh National Medical University. Our goal is to enhance students’ analytical skills, improve diagnostic reasoning, and promote data-informed decision-making in future clinical practice.

Literature review

The application of ML in medicine has been extensively examined by both international and local researchers. Gui C. and Chan V. [7] demonstrated ML’s utility in prediction, diagnosis, and improving access to medical care, while Habehh H. and Gohel S. [8] reviewed core ML paradigms — supervised, unsupervised, and reinforcement learning — highlighting applications in radiology, genomics, electronic health records, and neuroimaging.

Dhillon A. and Singh A. [9] summarized various ML algorithms used to analyze healthcare data such as clinical records, omics profiles, and sensor inputs. Ghassemi M, et al. [10] emphasized clinical opportunities, whereas Nayyar A, et al. [11] addressed challenges including data limitations, interpretability, and integration into clinical workflows. Jia Z, et al. [2] and Bi WL, et al. [12] stressed the need for standardized evaluation metrics and robust preprocessing to ensure reliable ML performance and broader educational adoption.

Magoulas G. D. and Prentza A. [13] underscored the importance of visualization, interpretable algorithms, and noise-resilient models for practical use. Sendak MP, et al. [14] evaluated 21 real-world ML implementations, detailing their development and maintenance challenges, thereby offering insights into how ML can be effectively scaled and monitored in healthcare.

Despite rapid advances, literature reveals several gaps. Most studies concentrate on narrow tasks, lack diverse datasets, or overlook integration into educational curricula. Many ML applications remain underutilized due to limited faculty expertise, inconsistent teaching materials, and a lack of empirical evaluation of student learning outcomes.

This study addresses these issues by expanding ML applications to a broader range of pediatric allergy datasets and by integrating ML modules into medical education. It proposes methods to enhance model resilience and instructional strategies, aiming to improve students’ competencies in analyzing and interpreting complex clinical data. Ultimately, the study contributes to evidence-based strategies for incorporating ML into medical curricula, fostering data-literate clinicians prepared for modern healthcare environments.

Methodology

This study followed a three-stage approach: (1) review of pedagogical and clinical literature on ML integration in medicine; (2) selection of practical, evidence-based ML methods; and (3) implementation and analysis of an educational intervention. The process informed the design of a training module for medical students and guided the use of ML for analyzing pediatric allergy data [15][16].

ML Model Selection

We applied supervised machine learning — specifically linear regression — due to its simplicity, interpretability, and wide use in clinical analytics [17-19]. The goal was to predict allergic disease outcomes using labeled clinical and demographic data. Emphasis was placed on data quality, feature selection, and model validation.

ML Workflow

The ML pipeline included:

Problem Definition: Predicting outcomes of childhood allergies.
Data Collection: Clinical and demographic data related to asthma, rhinitis, dermatitis, and food allergies.
Preprocessing: Handling missing values, normalization, and encoding.
Feature Engineering: Selection based on statistical relevance.
Model Training and Evaluation: Using MAE, MSE, RMSE, and R² as performance metrics.
Interpretation and Validation: Focused on explainability and accuracy for educational relevance.

Advanced model deployment and maintenance were not considered in this educational context. The methods emphasized practicality and reproducibility over complexity [20].

Allergy Domain Relevance

Although over 500 AI/ML applications are FDA-approved (mostly in radiology and cardiology), none are registered in allergy and immunology as of 2023 [14][21]. This study aims to close that gap by focusing on pediatric allergy prediction, a domain where early intervention can significantly impact outcomes [12].

Dataset

The dataset included patient-level variables such as age, gender, ethnicity, insurance type, allergy onset/end, comorbidities (asthma, dermatitis, rhinitis), and treatment history. Table 1 outlines the complete parameter structure. The dataset enabled training and evaluation of ML models that account for demographic and clinical variability without relying on synthetic augmentation [22-24].

Table 1

Key Parameters for Analyzing Patient Allergies and Asthma Profiles

Category	Parameter	Description
Demographics	Year of Birth	For calculating age
	Gender	Analyze allergy prevalence by gender
	Ethnicity	Assess ethnic variations in allergy susceptibility
	Insurance Type	Indicates healthcare access either private/public/uninsured
	Patient Cohort	Used for segmentation of patients based on group/time
	Age at Onset of Allergy	For studying allergy development
	Age at End of Study	For calculating the duration of allergy
Food Allergies	Shellfish Allergy, Milk Allergy, Soy Allergy, Egg Allergy, Wheat Allergy, Peanut Allergy, Sesame Allergy	Presence or absence at study start and end
Nut Allergies	Overall Status, Tree Nut Allergy, Pecan Allergy, Pistachio Allergy, Almond Allergy, Brazil Nut Allergy	Presence or absence at study start and end
Comorbidities	Atopic Dermatitis, Allergic Rhinitis, Asthma	Presence or absence at study start and end
Asthma Medications	First, Last and Total	First asthma medication prescribed to the patient; most recent asthma medication prescribed to the patient and total asthma medications prescribed throughout the study

Participants and Grouping

Participants were 156 third-year students from pediatrics, medicine, dentistry, and pharmacy programs at Astana Medical University and S. D. Asfendiyarov Kazakh National Medical University. Students were assigned to experimental (n=80) or control (n=76) groups based on schedule compatibility. Those with prior ML experience were excluded. Table 2 details group distribution by program and university.

Table 2

Participants Distribution

№	University	Education Program	Number of Students	Experimental Group	Control Group
1	Astana Medical University	6В10125-Pediatrics	23	17	6
		6В10123-Medicine	14	10	4
		6В10124-Dentistry	42	30	12
		6В10104-Pharmacy	20	20	0
2	S. D. Asfendiyarov Kazakh National Medical University	6В10116-Medicine	24	5	19
		6В10117-Pediatrics	26	5	21
		6В10118-Dentistry	30	10	20
		Total	156	80	76

Educational Intervention

The experimental group underwent a 4-week ML training module integrated into their pediatrics coursework. Modules included:

Week 1: Introduction to supervised/unsupervised learning and AI ethics (2 hours);
Week 2: Data preprocessing using Python (4 hours);
Week 3: Linear regression and error metrics (4 hours);
Week 4: Model interpretation and clinical case discussions (2 hours).

Assignments and a final mini-project supported practical engagement. Sessions were guided by instructors with ML expertise.

Results

Visualizing Allergy Prevalence and Patterns

Figure 1 provides an analysis of allergy case numbers, revealing the relative prevalence of conditions such as asthma, skin allergies, and respiratory issues. As shown in Figure 1, the most common allergies are skin allergies, respiratory issues, and asthma. Identifying these patterns is critical for guiding early preventive measures and informing clinical resource allocation, especially in contexts where ML-driven insights can enhance diagnostic accuracy and healthcare management strategies [1][2].

Figure 1 Number of Allergy Cases by Type.

Insights into Allergy Onset and Severity

Figure 2 illustrated that certain skin allergies could be detected from birth, implicating strong genetic or maternal influences. Such early onset underlines the importance of initiating preventive strategies and dietary/environmental controls promptly to mitigate long-term severity.

Figure 2 Age at onset of Skin Allergies.

Similarly, Figure 3 highlighted that non-contact allergies present at the start of the study were associated with reduced exacerbations following medicinal treatment for atopic dermatitis. These findings resonate with research indicating that the interplay of genetic predispositions and environmental exposures dictates the trajectory of allergic diseases [3][4]. By leveraging ML models to handle the complexity of these interactions, clinicians and educators can develop more tailored interventions, potentially improving patient outcomes. Analysis of the data presented in Figure 3 indicates that the presence of non-contact allergies in children, irrespective of gender and measured at the start of the study, reduces the worsening of allergic symptoms following medicinal treatment for atopic dermatitis. Pediatric allergies are primarily the result of an interplay between genetic predispositions and environmental exposures, both of which significantly influence the onset and progression of allergic conditions. While genetic factors contribute to inherent susceptibility, environmental factors such as allergens, irritants, and infections shape a child’s sensitivity to dietary and inhaled allergens.

Figure 3 Status of atopic dermatitis at the start of the study.

Advances in understanding the genetic and environmental mechanisms that influence the immature immune system are expected to lead to the development of long-term preventive strategies. At present, the most effective method for managing children at high risk of developing allergies is the reduction of sensitization through early dietary and environmental interventions. Such measures facilitate timely identification and appropriate management of emerging allergic symptoms. Since inflammation is a common underlying factor in all allergic conditions, it is critical to initiate anti-inflammatory treatment at the earliest indication of persistent symptoms.

Model Performance and Predictive Accuracy

A supervised ML approach, specifically linear regression, was employed to predict allergic outcomes based on the dataset’s parameters. After appropriate data preprocessing and feature engineering, the model’s performance was evaluated. As shown in Table 3, the model achieved a Coefficient of Determination (R²) of 0,85, indicating that 85% of the variance in the dependent variable was explained by the independent variables. The Mean Absolute Error (MAE=0,45), Mean Squared Error (MSE=0,30), and Root Mean Squared Error (RMSE=0,55) further underscored the model’s accuracy and reliability.

Table 3

Performance Indicators of the Machine Learning Model

Metric	Value
R²	0,85
MAE	0,45
MSE	0,30
RMSE	0,55

These robust performance metrics align with emerging evidence that ML algorithms can outperform traditional statistical methods in diagnosing and predicting complex clinical outcomes [1-4]. The high R² value and relatively low errors validate the chosen modeling approach and confirm that ML can effectively capture the intricate factors contributing to allergic conditions.

Educational Intervention and Experimental Outcomes

In addition to modeling allergic diseases, the study assessed whether integrating ML training into medical curricula could enhance students’ analytical capabilities. Figure 4 depicts the baseline results for the control group. Figure 5 depicts the baseline results for the experimental group. The values are in percentage. Comparison between Figure 4 and Figure 5 indicates no significant initial differences in ML understanding between the two groups.

Figure 4 Baseline experiment results for the control group.

Figure 5 Baseline experiment results for the experimental group.

After the introduction of ML modules into the curriculum, the formative experiment results were calculated. Figure 6 depicts the formative experiment results for the control group.

Figure 6. Formative results for the control group.

Figure 7 depicts the formative experiment results for the experimental group. Comparison between figures 6 and 7 revealed that the experimental group exhibited substantial improvements in ML knowledge and data processing proficiency. However, the control group’s performance remained relatively stable.

Figure 7. Formative results for the experimental group.

Statistical Validation of Hypotheses

To quantify the educational impact, Chi-square (χ²) tests were conducted. The calculated values χ²(motivational) =18,81, χ²(substantive) =16,11, and χ²(organizational) =14,67 each surpassed the critical value at p<0,05. These results supported the alternative hypothesis, confirming that implementing scientific, theoretical, and practical ML foundations in the educational process improves students’ ML competencies, analytical reasoning, and data-driven decision-making skills. Overall, the Results section demonstrates the efficacy of ML models in analyzing complex pediatric allergy data and substantiates the value of integrating ML training into medical education. This dual accomplishment sets the stage for the subsequent Discussion section, where the implications of these findings for clinical practice, educational strategies, and future research directions will be explored.

Discussion

This study demonstrates two central findings: (1) supervised ML, specifically linear regression, effectively models complex clinical data related to childhood allergic conditions; and (2) structured ML training significantly improves medical students’ analytical skills.

Clinically, the ML model achieved strong predictive performance, with an R² of 0,85 and low error metrics (MAE, MSE, RMSE), indicating the model’s ability to capture the multifactorial nature of pediatric allergy outcomes. These findings are consistent with prior studies showing that ML outperforms traditional statistical approaches in analyzing large, heterogeneous medical datasets. As such, ML tools can support more precise diagnostics, prognosis, and individualized treatment strategies.

Educationally, the intervention enhanced students’ understanding of key ML concepts, from data preprocessing to model interpretation. The experimental group showed measurable gains over the control group, confirming the value of integrating ML into medical curricula. These results suggest that early exposure to data-driven reasoning equips future clinicians with essential competencies for practicing in increasingly technology-driven healthcare environments.

Importantly, ML is not a replacement for clinical expertise but a complement to it. Trained professionals are needed to interpret algorithmic outputs, assess model reliability, and apply insights ethically in patient care. Embedding ML literacy into training programs fosters responsible use of emerging technologies and prepares students for interdisciplinary collaboration in AI-augmented settings.

From a clinical perspective, the model’s ability to highlight early onset patterns and treatment responses in atopic dermatitis illustrates how ML can guide timely interventions. These insights reinforce the potential for ML-driven tools to support preventive strategies in allergy management.

Limitations include the study’s quasi-experimental design and the use of a relatively basic algorithm. Future work should explore more advanced models (e.g., decision trees, ensembles, neural networks) and assess long-term educational outcomes, including clinical application and patient impact. Longitudinal studies could also evaluate skill retention and real-world integration of ML competencies into clinical workflows.

Overall, this study supports the feasibility and value of incorporating ML into both medical education and pediatric allergy analysis. It lays a foundation for broader adoption of ML training in health curricula and further research into scalable, ethically grounded AI applications in medicine.

Conclusion

This study demonstrated that incorporating ML methodologies into medical education and clinical data analysis can yield substantial benefits. On the clinical side, a supervised ML approach, specifically linear regression, accurately modeled complex pediatric allergy data, achieving an R² of 0,85 and relatively low prediction errors (MAE, MSE, RMSE). These results confirm that ML tools are capable of capturing the nuanced interplay of genetic and environmental factors that shape allergy onset, severity, and progression, aligning with literature that acknowledges ML’s capacity to outpace traditional analytical techniques in healthcare.

Equally important, the educational intervention showed that providing medical students with structured ML training significantly improved their ability to preprocess data, select and interpret ML models, and make informed, data-driven decisions. Such proficiency will be indispensable in an era of healthcare increasingly defined by advanced analytics, where clinicians must understand and critically evaluate algorithmic outputs to ensure patient safety and optimal treatment outcomes. The observed enhancements in student competencies suggest that integrating ML concepts into medical curricula can fill existing educational gaps, better preparing future healthcare professionals for the realities of a data-intensive clinical environment.

While the results are encouraging, the study’s limitations — such as the non-randomized assignment of participants and reliance on linear regression — highlight areas for future research. Expanding investigations to more sophisticated ML algorithms, implementing randomized controlled trials, and conducting longitudinal follow-ups could provide deeper insights into long-term skill retention and the real-world impact of ML-trained clinicians on patient care and healthcare efficiency.

In essence, this research contributes evidence that ML-driven approaches can simultaneously improve the quality of medical data analysis and the caliber of medical education. By validating ML’s effectiveness in handling complex clinical datasets and showing that structured ML instruction enhances students’ analytical capabilities, this work lays a foundation for ongoing developments. As ML techniques continue to evolve, so too must educational strategies and clinical guidelines, ensuring that these powerful tools are leveraged ethically, responsibly, and effectively to advance patient care and health outcomes.

Relationships and Activities: none.

References

1. Cruz JA, Wishart DS. Applications of machine learning in cancer prediction and prognosis. Cancer Inform. 2007;2:59-77.

2. Jia Z, Chen J, Xu X, et al. The importance of resource awareness in artificial intelligence for healthcare. Nat Mach Intell. 2023; 5(7):687-98.

3. Cunningham P, Cord M, Delany SJ. Supervised Learning. In: Cord M, Cunningham P, editors. Machine Learning Techniques for Multimedia: Case Studies on Organization and Retrieval [Internet]. Berlin, Heidelberg: Springer; 2008 p. 21-49. doi:10.1007/978-3-540-75171-7_2.

4. Naik N, Rallapalli Y, Krishna M, et al. Demystifying the Advancements of Big Data Analytics in Medical Diagnosis: An Overview. Eng Sci. 2021;19(2):42-58.

5. Scott I, Carter S, Coiera E. Clinician checklist for assessing suitability of machine learning applications in healthcare: BMJ Health & Care Informatics 2021;28:e100251. doi:10.1136/bmjhci-2020-100251.

6. Orynbaeva AS, Shindaliyev NT, Abdikadyr ZN. Possibilities of Using Machine Learning Algorithms in Medical Data Processing: Manual for Students. Astana Aktaulova’s LLP; 2024. 190 p.

7. Gui C, Chan V. Machine learning in medicine. Univ West Ont Med J. 2017;86(2):76-8.

8. Habehh H, Gohel S. Machine Learning in Healthcare. Curr Genomics. 2021;22(4):291-300.

9. Dhillon A, Singh A. Machine learning in healthcare data analysis: a survey. J Biol Today’s World. 2019;8(6):1-10.

10. Ghassemi M, Naumann T, Schulam P, et al. A Review of Challenges and Opportunities in Machine Learning for Health. AMIA Summits Transl Sci Proc. 2020;2020:191-200.

11. Nayyar A, Gadhavi L, Zaman N. Chapter 2 — Machine learning in healthcare: review, opportunities and challenges. In: Machine learning in healthcare: review, opportunities and challenges. 2021;23-45. doi:10.1016/B978-0-12-821229-5.00011-2.

12. Bi WL, Hosny A, Schabath MB, et al. Artificial intelligence in cancer imaging: Clinical challenges and applications. CA Cancer J Clin. 2019;69(2):127-57.

13. Magoulas GD, Prentza A. Machine learning in medical applications. In: Advanced course on artificial intelligence. Springer; 1999. p. 300-7.

14. Sendak MP, D’Arcy J, Kashyap S, et al. A Path for Translation of Machine Learning Products into Healthcare Delivery. EMJ Innov. 2020. doi:10.33590/emjinnov/19-00172.

15. Deo RC. Machine Learning in Medicine. Circulation. 2015;132(20): 1920-30.

16. Zamzam AH, Abdul Wahab AK, Azizan MM, et al. A Systematic Review of Medical Equipment Reliability Assessment in Improving the Quality of Healthcare Services. Front Public Health. 2021;9:753951.

17. Cajal B, Jiménez R, Gervilla E, Montaño JJ. Doing a Systematic Review in Health Sciences. Clin Health. 2020;31(2):77-83.

18. Goodacre R, Broadhurst D, Smilde AK, et al. Proposed minimum reporting standards for data analysis in metabolomics. Metabolomics. 2007;3(3):231-41.

19. Montgomery DC, Peck EA, Vining GG. Introduction to Linear Regression Analysis. 5th edition. Hoboken, NJ: John Wiley & Sons Inc; 2012. 645 p.

20. Rajkomar A, Dean J, Kohane I. Machine Learning in Medicine. N Engl J Med. 2019;380(14):1347-58.

21. van Breugel M, Fehrmann RSN, Bügel M, et al. Current state and prospects of artificial intelligence in allergy. Allergy. 2023; 78(10):2623-43.

22. Khan M, Banerjee S, Muskawad S, et al. The Impact of Artificial Intelligence on Allergy Diagnosis and Treatment. Curr Allergy Asthma Rep. 2024;24(7):361-72.

23. Breiteneder H, Diamant Z, Eiwegger T, et al. Future research trends in understanding the mechanisms underlying allergic diseases for improved patient care. Allergy. 2019;74(12):2293-311.

24. Rabe KF, Adachi M, Lai CKW, et al. Worldwide severity and control of asthma in children and adults: The global asthma insights and reality surveys. J Allergy Clin Immunol. 2004;114(1):40-7.

About the Authors

N. Shyndaliyev

L. N. Gumilyov Eurasian National University
Kazakhstan

Shyndaliyev Nurzhan — Candidate of Pedagogic Sciences, of teacher of physics and computer science

Astana

A. Orynbayeva

L. N. Gumilyov Eurasian National University, Astana Astana Medical University, Astana
Kazakhstan

Orynbayeva Ainur — senior lecturer at the Department of Biostatistics, Bioinformatics and Information Technology at Astana Medical University, doctoral student, graduated master’s degree from the Kazakh University of Economics, Finance and International Trade, majoring in Information Systems

Astana

K. Shadinova

S. D. Asfendiyarov Kazakh National Medical University, Almaty
Kazakhstan

Shadinova Kunsulu — Associate Professor in Pedagogy, at the Department of Information and Communication Technologies, Asfendiyarov Kazakh National Medical University

Almaty

A. Barakova

S. D. Asfendiyarov Kazakh National Medical University, Almaty
Kazakhstan

Barakova Aliya — senior lecturer at the Department of Engineering Disciplines and Good Practices at the Asfendirov National Medical University, master's degree in Computer Science

Almaty

N. Nurmukhanbetova

Sh. Ualikhanov Kokshetau University, Kokshetau
Kazakhstan

Nurmukhanbetova Nurgul — Candidate of Chemical Sciences, associate professor at the Department of Chemistry and Biotechnology of Kokshetau Sh. Ualikhanov University

Kokshetau

Review

For citations:

This work is licensed under a Creative Commons Attribution 4.0 License.

ISSN 1728-8800 (Print)
ISSN 2619-0125 (Online)

Username
Password
	Remember me
Not a user? Register with this site Forgot your password?

User

Cardiovascular Therapy and Prevention

Improving data processing in medical education through machine learning

Full Text:

Abstract

Keywords

For citations:

Introduction

Literature review

Methodology

ML Model Selection

ML Workflow

Allergy Domain Relevance

Dataset

Participants and Grouping

Educational Intervention

Results

Visualizing Allergy Prevalence and Patterns

Insights into Allergy Onset and Severity

Model Performance and Predictive Accuracy

Educational Intervention and Experimental Outcomes

Statistical Validation of Hypotheses

Discussion

Conclusion

References

About the Authors

Review

For citations:

Cookies policy