Machine Learning Predictions of Student Outcomes

The Role of Educational Structure and Social Stressors in Czech Municipalities

Authors

  • Martin Flegl School of Engineering and Sciences, Tecnologico de Monterrey https://orcid.org/0000-0002-9944-8475
  • Marketa Matulova Faculty of Business and Administration, Masaryk University, Brno, Czechia
  • Kristyna Vltavska Faculty of Informatics and Statistics, Prague University of Economics and Business, Czech Republic

DOI:

https://doi.org/10.7160/eriesj.2026.190104

Keywords:

Czech municipalities, educational responsibility, educational structure, machine learning, predictive analytics, social stressors

Abstract

Persistent disparities in student learning outcomes across Czech municipalities highlight the challenge of ensuring equitable access to quality education. These disparities are not only associated with demographic and economic conditions but also with the responsibility of municipalities and institutions to address structural inequalities. This study applies machine learning and SHAP analysis to predict student learning outcomes across municipalities with extended jurisdiction (MEJs), using demographic, economic, social, and housing indicators. Results highlight the dominant role of educational structure, with the share of people without secondary education and the proportion of younger adults holding college degrees emerging as the most influential predictors. Social and housing stressors, including parental executions, poverty destabilization, and housing allowances, further moderate outcomes, revealing nonlinear threshold effects that refine the explanatory narrative. The combined model achieved an R² of 0.629, confirming that while demographic and educational indicators explain most of the variance, contextual vulnerabilities add interpretive richness by identifying vulnerable subgroups. These findings underscore the dual influence of structural educational attainment and social stressors on student performance, while emphasizing educational responsibility as a key dimension in promoting equity and sustainable development.

References

Arlot, S. and Celisse, A. (2010) ‘A survey of cross-validation procedures for model selection’, Statistics Surveys, Vol. 4, pp. 40–79. https://dx.doi.org/10.1214/09-SS054

Arnold, K. E. and Pistilli, M. D. (2012) ‘Course signals at Purdue: Using learning analytics to increase student success’, in: Proceedings of the 2nd International Conference on Learning Analytics and Knowledge (LAK 2012), pp. 267–270. https://doi.org/10.1145/2330601.2330666

Asif, R., Merceron, A., Ali, S. A. and Haider, N. G. (2017) ‘Analyzing undergraduate students‘ performance using educational data mining’, Computers & Education, Vol. 113, pp. 177–194. https://doi.org/10.1016/j.compedu.2017.05.007

Bertoletti, A., Berbegal-Mirabent, J. and Agasisti, T. (2022) ‘Higher education systems and regional economic development in Europe: A combined approach using econometric and machine learning methods’, Socio-Economic Planning Sciences, Vol. 82, p. 101231. https://doi.org/10.1016/j.seps.2022.101231

Bird, K. A., Castleman, B. L., Mabel, Z. and Song, Y. (2021) ‘Bringing Transparency to Predictive Analytics: A Systematic Comparison of Predictive Modeling Methods in Higher Education’, AERA Open, Vol. 7, No. 1, pp. 1–19. https://doi.org/10.1177/23328584211037630

Bravo Sanzana, M., Salvo Garrido, S. and Muñoz Poblete, C. (2015) ‘Profiles of Chilean students according to academic performance in mathematics: An exploratory study using classification trees and random forests’, Studies in Educational Evaluation, Vol. 44, pp. 50–59. http://dx.doi.org/10.1016/j.stueduc.2015.01.002

Breiman, L. (2001) ‘Random Forests’, Machine Learning, Vol. 45, pp. 5–32. https://doi.org/10.1023/A:1010933404324

Browne, M. W. (2000) ‘Cross-validation methods’, Journal of Mathematical Psychology, Vol. 44, No. 1, pp. 108–132. https://doi.org/10.1006/jmps.1999.1279

Cheng, B., Liu, Y. and Jia, Y. (2024) ‘Evaluation of students’ performance during the academic period using the XG-Boost Classifier-Enhanced AEO hybrid model’, Expert Systems with Applications, Vol. 238, p. 122136. https://doi.org/10.1016/j.eswa.2023.122136

Chung, J. Y. and Lee, S. (2019) ‘Dropout early warning systems for high school students using machine learning’, Children and Youth Services Review, Vol. 96, pp. 346–353. https://doi.org/10.1016/j.childyouth.2018.11.030

Clément, M. and Piaser, L. (2022) ‘Geography of Income and Education Inequalities in Mexico: Evidence from Small Area Estimation and Exploratory Spatial Analysis’, The European Journal of Development Research, Vol. 34, No. 2, pp. 703–732. https://doi.org/10.1057/s41287-021-00386-0

Conijn, R., Snijders, C., Kleingeld, A. and Matzat, U. (2017) ‘Predicting student performance from LMS data: A comparison of 17 blended courses using Moodle LMS’, IEEE Transactions on Learning Technologies, Vol. 10, No. 1, pp. 17–29. https://doi.org/10.1109/TLT.2016.2616312

Deleña, R. D., Dia, N. J., Sacayan, R. R., Sieras, J. C., Khalid, S. A., Macatotong, A. H. T., and Gulam, S. B. (2025) ‘Predicting student retention: A comparative study of machine learning approach utilizing sociodemographic and academic factors’, Systems and Soft Computing, Vol. 7, p. 200352. https://doi.org/10.1016/j.sasc.2025.200352

European Commission (2025) European Commission: Directorate-General for Education, Youth, Sport and Culture, Education and training monitor 2025 – Czechia. Publications Office of the European Union. Available at: https://data.europa.eu/doi/10.2766/6444520 [Accessed 2 February 2026].

Eurydice (2024) National education systems: Czech Republic, European Commission. Available at: https://eurydice.eacea.ec.europa.eu/eurypedia/czechia/overview [Accessed 11 February 2026]

Ferguson, R. (2012) ‘Learning Analytics: Drivers, Developments and Challenges’, International Journal of Technology Enhanced Learning, Vol. 4, No. 5/6, pp. 304–317. https://doi.org/10.1504/IJTEL.2012.051816

Flegl, M., Vltavská, K. and Acero, A. (2025) ‘Towards evaluation of the Czech primary education and its effect on civic engagement and governance’, in: Proceedings of the 43rd International Conference on Mathematical Methods in Economics (MME 2025), Zlín, Czech Republic, pp. 192–197.

Han, S., Williamson, B. D. and Fong, Y. (2021) ‘Improving random forest predictions in small datasets from two-phase sampling designs’, BMC Medical Informatics and Decision Making, Vol. 21, No. 1, p. 322. https://doi.org/10.1186/s12911-021-01688-3

Hauschildt, K., Gwosc, C., Schirmer, H., Mandl, S. and Menz, C. (2024) Social and economic conditions of student life in Europe: Eurostudent 8 synopsis of indicators 2021–2024, Bielefeld: wbv Media. https://doi.org/10.3278/6001920ew

Hawkins, D. M. (2004) ‘The problem of overfitting’, Journal of Chemical Information and Computer Sciences, Vol. 44, No. 1, pp. 1–12. https://doi.org/10.1021/ci0342472

Herodotou, C., Rienties, B., Boroeca, A., Zdrahal, Z. and Hlosta, M. (2019) ‘A Large-scale Implementation of Predictive Learning Analytics in Hospitality and Healthcare Courses’, Educational Technology Research and Devwlopment, Vol. 67, No. 5, pp. 1273–1306. https://doi.org/10.1007/s11423-019-09685-0

Hussain, M., Zhu, W., Zhang, W. and Abidi, S. M. R. (2018) ‘Student Engagement Predictions in an e-Learning System and Their Impact on Student Course Assessment Scores’, Computational Intelligence and Neuroscience, Vol. 2018, No. 1, p. 6347186. https://doi.org/10.1155/2018/6347186

Jafari, A., Aghsami, A. and Rabbani, M. (2025) ‘Selecting the best way to forecast income in the banking industry using data mining methods, a case study’, OPSEARCH, Vol. 62, No. 3, pp. 1383–1422. https://doi.org/10.1007/s12597-024-00852-3

Jiang, X., Du, Y. and Zheng, Y. (2024) ‘Evaluation of physical education teaching effect using Random Forest model under artificial intelligence’, Heliyon, Vol. 10, No. 1, e23576. https://doi.org/10.1016/j.heliyon.2023.e23576

Khan, S., Mazhar, T., Shahzad, T., Khan, M.A., Waheed, W., Waheed, A. and Hamam, H. (2025) ‘Predictive analytics in education- enhancing student achievement through machine learning’, Social Sciences & Humanities Open, Vol. 12, p. 101824. https://doi.org/10.1016/j.ssaho.2025.101824

Kotsiantis, S. B. (2012) ‘Use of machine learning techniques for educational proposes: a decision support system for forecasting students’ grades’, Artificial Intelligence Review, Vol. 37, No. 4, pp. 331–344. https://doi.org/10.1007/s10462-011-9234-x

Lourens, A. and Bleazard, D. (2016) ‘Applying predictive analytics in identifying students at risk: A case study’, South African Journal of Higher Education, Vol. 30, No. 2, pp. 129–150. https://doi.org/10.20853/30-2-583

Lundberg, S. M., Erion, G., Chen, H., DeGrave, A., Prutkin, J. M., Nair, B., Katz, R., Himmelfarb, J., Bansal, N. and Lee, S. I. (2020) ‘From Local Explanations to Global Understanding with Explainable AI for Trees’, Nature machine intelligence, Vol. 2, No. 1, pp. 56–67. https://doi.org/10.1038/s42256-019-0138-9

Lundberg, S. M. and Lee, S. I. (2017) ‘A unified approach to interpreting model predictions’, Advances in Neural Information Processing Systems, Vol. 30, pp. 4765–4774. https://doi.org/10.48550/ARXIV.1705.07874

Mazouch, P. and Fischer, J. (2024) Více času na pedagogické vedení školy prostřednictvím efektivního zajištění nepedagogických činností [More time for pedagogical management of the school through effective provision of non-pedagogical activities], Prague: Prague University of Economics and Business. Available at: https://partnerstvi2030.cz/wp-content/uploads/Vice_casu_na_pedagogicke_vedeni_skoly_VSE.pdf [Accessed 3 March 2026]

MEYS (2020) Strategy for the education policy of the Czech Republic up to 2030+, Prague: Ministry of Education, Youth and Sports. Available at: https://msmt.gov.cz/uploads/brozura_S2030_en_fin_online.pdf [Accessed 2 February 2026]

Molnar, C. (2022) Interpretable Machine Learning: A Guide for Making Black Box Models Explainable, Self-Published. https://christophm.github.io/interpretable-ml-book/

Nieuwenhuis, J. and Hooimeijer, P. (2016) ‘The association between neighbourhoods and educational achievement, a systematic review and meta-analysis’, Journal of Housing and the Built Environment, Vol. 31, No. 2, pp. 321–347. https://doi.org/10.1007/s10901-015-9460-7

Nouri, J., Ebner, M., Ifenthaler, D., Saqr, M., Malmberg, J., Khalil, M., Bruun, J., Viberg, O., González, M. Á. C., Papamitsiou Z. and Berthelsen, U. D. (2019) ‘Efforts in Europe for Data-Driven Improvement of Education: A Review of Learning Analytics Research in Seven Countries’, International Journal of Learning Analytics and Artificial Intelligence for Education (iJAI), Vol. 1, No. 1, pp. 8–27. https://doi.org/10.3991/ijai.v1i1.11053

OECD (2020) Education policy outlook in the Czech Republic, OECD Education Policy Perspectives, No. 11, Paris: OECD Publishing. https://doi.org/10.1787/6363ab1d-en

OECD (2023a) Education at a Glance 2023: OECD Indicators, Paris: OECD Publishing. https://doi.org/10.1787/e13bef63-en

OECD (2023b) PISA 2022 results (Volume I): The state of learning and equity in education, Paris: OECD Publishing. https://doi.org/10.1787/53f23881-en

OECD (2025a) OECD economic surveys: Czechia 2025, Paris: OECD Publishing. https://doi.org/10.1787/7a70af5c-en

OECD (2025b) Education at a Glance 2025: OECD Indicators, Paris: OECD Publishing. https://doi.org/10.1787/1c0d9c79-en

PAQresearch (2026) Mapa vzdělávání [Education map], Available at: https://mapavzdelavani.cz/ [Accessed 26 January 2026]

Rabelo, A. M. and Zárate, L. E. (2025) ‘A model for predicting dropout of higher education students’, Data Science and Management, Vol. 8, No. 1, pp. 72–85. https://doi.org/10.1016/j.dsm.2024.07.001

Romero, C. and Ventura, S. (2010) ‘Educational Data Mining: A Review of the State of the Art’, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), Vol. 40, No. 6, pp. 601–618. https://doi.org/10.1109/TSMCC.2010.2053532

SGI (2024) Sustainable governance indicators – Czech Republic, Bertelsmann Stiftung. Available at: https://www.sgi-network.org/2024/Czechia [Accessed 11 February 2026]

Shafiq, D. A., Marjani, M., Habeeb, R. A. A. and Asirvatham, D. (2022) ‘Student Retention Using Educational Data Mining and Predictive Analytics: A Systematic Literature Review’, IEEE Access, Vol. 10, pp. 72480–72503. https://doi.org/10.1109/ACCESS.2022.3188767

Shapley, L. (1953) ‘A value for n-person games’, in: Kuhn, H. and Tucker, A. (eds.), Contributions to the Theory of Games II, Princeton: Princeton University Press, pp. 307–317. https://doi.org/10.1515/9781400881970-018

Simonová, N. and Soukup, P. (2015) ‘Impact of primary and secondary social origin factors on the transition to university in the Czech Republic’, British Journal of Sociology of Education, Vol. 36, No. 5, pp. 707–728. https://doi.org/10.1080/01425692.2013.854690

Song, Q., Liu, Y. and Tan, C. Y. (2025) ‘Effects of Family Socioeconomic Status on Educational Outcomes in Primary and Secondary Education: A Systematic Review of the Causal Evidence’, Educational Psychology Review, Vol. 37, No. 29. https://doi.org/10.1007/s10648-025-10004-8

Šťastný, V. (2023) ‘Shadow education in the context of early tracking: between-track differences in the Czech Republic’, Compare: A Journal of Comparative and International Education, Vol. 53, No. 3, pp. 380–398. https://doi.org/10.1080/03057925.2021.1922271

Tan, C. Y. (2024) ‘Socioeconomic Status and Student Learning: Insights from an Umbrella Review’, Educational Psychology Review, Vol. 36, No. 4. https://doi.org/10.1007/s10648-024-09929-3

Tsai, Y. and Gašević, D. (2017) ‘Learning analytics in higher education - challenges and policies: A review of eight learning analytics policies’, in: Proceedings of the Seventh International Learning Analytics & Knowledge Conference (LAK 2017), pp. 233–242. https://doi.org/10.1145/3027385.3027400

Umer, R., Susnjak, T., Mathrani, A. and Suriadi, L. (2023) ‘Current stance on predictive analytics in higher education: opportunities, challenges and future directions’, Interactive Learning Environments, Vol. 31, No. 6, pp. 3503–3528. https://doi.org/10.1080/10494820.2021.1933542

Veerman, G. J. and Denessen, E. (2021) ‘Social cohesion in schools: A non-systematic review of its conceptualization and instruments’, Cogent Education, Vol. 8, No. 1, pp. 1–14. https://doi.org/10.1080/2331186X.2021.1940633

Xu, Y. (2020) ‘Foreclosed American Dream? Parental Foreclosure and Young Adult Children’s Homeownership’, Journal of Family and Economic Issues, Vol. 41, No. 3, pp. 458–471. https://doi.org/10.1007/s10834-020-09665-0

Zhang, Y. and Yang, Q. (2022) ‘A Survey on Multi-Task Learning’, IEEE Transactions on Knowledge and Data Engineering, Vol. 34, No. 12, pp. 5586–5609. https://dx.doi.org/10.1109/TKDE.2021.3070203

Zuluaga, R., Camelo-Guarín, A. and De La Hoz, E. (2023) ‘Assessing the Relative Impact of Colombian Higher Education Institutions Using Fuzzy Data Envelopment Analysis (Fuzzy-DEA) in State Evaluations’, Journal on Efficiency and Responsibility in Education and Science, Vol. 16, No. 4, pp. 299–312. http://dx.doi.org/10.7160/eriesj.2023.160404

Additional Files

Published

2026-03-31

How to Cite

Flegl, M., Matulova, M. and Vltavska, K. (2026) ’Machine Learning Predictions of Student Outcomes: The Role of Educational Structure and Social Stressors in Czech Municipalities’, Journal on Efficiency and Responsibility in Education and Science, vol. 19, no. 1, pp. 40–56. https://doi.org/10.7160/eriesj.2026.190104

Most read articles by the same author(s)