Machine Learning Predictions of Student Outcomes
The Role of Educational Structure and Social Stressors in Czech Municipalities
DOI:
https://doi.org/10.7160/eriesj.2026.190104Keywords:
Czech municipalities, educational responsibility, educational structure, machine learning, predictive analytics, social stressorsAbstract
Persistent disparities in student learning outcomes across Czech municipalities highlight the challenge of ensuring equitable access to quality education. These disparities are not only associated with demographic and economic conditions but also with the responsibility of municipalities and institutions to address structural inequalities. This study applies machine learning and SHAP analysis to predict student learning outcomes across municipalities with extended jurisdiction (MEJs), using demographic, economic, social, and housing indicators. Results highlight the dominant role of educational structure, with the share of people without secondary education and the proportion of younger adults holding college degrees emerging as the most influential predictors. Social and housing stressors, including parental executions, poverty destabilization, and housing allowances, further moderate outcomes, revealing nonlinear threshold effects that refine the explanatory narrative. The combined model achieved an R² of 0.629, confirming that while demographic and educational indicators explain most of the variance, contextual vulnerabilities add interpretive richness by identifying vulnerable subgroups. These findings underscore the dual influence of structural educational attainment and social stressors on student performance, while emphasizing educational responsibility as a key dimension in promoting equity and sustainable development.
References
Arlot, S. and Celisse, A. (2010) ‘A survey of cross-validation procedures for model selection’, Statistics Surveys, Vol. 4, pp. 40–79. https://dx.doi.org/10.1214/09-SS054
Arnold, K. E. and Pistilli, M. D. (2012) ‘Course signals at Purdue: Using learning analytics to increase student success’, in: Proceedings of the 2nd International Conference on Learning Analytics and Knowledge (LAK 2012), pp. 267–270. https://doi.org/10.1145/2330601.2330666
Asif, R., Merceron, A., Ali, S. A. and Haider, N. G. (2017) ‘Analyzing undergraduate students‘ performance using educational data mining’, Computers & Education, Vol. 113, pp. 177–194. https://doi.org/10.1016/j.compedu.2017.05.007
Bertoletti, A., Berbegal-Mirabent, J. and Agasisti, T. (2022) ‘Higher education systems and regional economic development in Europe: A combined approach using econometric and machine learning methods’, Socio-Economic Planning Sciences, Vol. 82, p. 101231. https://doi.org/10.1016/j.seps.2022.101231
Bird, K. A., Castleman, B. L., Mabel, Z. and Song, Y. (2021) ‘Bringing Transparency to Predictive Analytics: A Systematic Comparison of Predictive Modeling Methods in Higher Education’, AERA Open, Vol. 7, No. 1, pp. 1–19. https://doi.org/10.1177/23328584211037630
Bravo Sanzana, M., Salvo Garrido, S. and Muñoz Poblete, C. (2015) ‘Profiles of Chilean students according to academic performance in mathematics: An exploratory study using classification trees and random forests’, Studies in Educational Evaluation, Vol. 44, pp. 50–59. http://dx.doi.org/10.1016/j.stueduc.2015.01.002
Breiman, L. (2001) ‘Random Forests’, Machine Learning, Vol. 45, pp. 5–32. https://doi.org/10.1023/A:1010933404324
Browne, M. W. (2000) ‘Cross-validation methods’, Journal of Mathematical Psychology, Vol. 44, No. 1, pp. 108–132. https://doi.org/10.1006/jmps.1999.1279
Cheng, B., Liu, Y. and Jia, Y. (2024) ‘Evaluation of students’ performance during the academic period using the XG-Boost Classifier-Enhanced AEO hybrid model’, Expert Systems with Applications, Vol. 238, p. 122136. https://doi.org/10.1016/j.eswa.2023.122136
Chung, J. Y. and Lee, S. (2019) ‘Dropout early warning systems for high school students using machine learning’, Children and Youth Services Review, Vol. 96, pp. 346–353. https://doi.org/10.1016/j.childyouth.2018.11.030
Clément, M. and Piaser, L. (2022) ‘Geography of Income and Education Inequalities in Mexico: Evidence from Small Area Estimation and Exploratory Spatial Analysis’, The European Journal of Development Research, Vol. 34, No. 2, pp. 703–732. https://doi.org/10.1057/s41287-021-00386-0
Conijn, R., Snijders, C., Kleingeld, A. and Matzat, U. (2017) ‘Predicting student performance from LMS data: A comparison of 17 blended courses using Moodle LMS’, IEEE Transactions on Learning Technologies, Vol. 10, No. 1, pp. 17–29. https://doi.org/10.1109/TLT.2016.2616312
Deleña, R. D., Dia, N. J., Sacayan, R. R., Sieras, J. C., Khalid, S. A., Macatotong, A. H. T., and Gulam, S. B. (2025) ‘Predicting student retention: A comparative study of machine learning approach utilizing sociodemographic and academic factors’, Systems and Soft Computing, Vol. 7, p. 200352. https://doi.org/10.1016/j.sasc.2025.200352
European Commission (2025) European Commission: Directorate-General for Education, Youth, Sport and Culture, Education and training monitor 2025 – Czechia. Publications Office of the European Union. Available at: https://data.europa.eu/doi/10.2766/6444520 [Accessed 2 February 2026].
Eurydice (2024) National education systems: Czech Republic, European Commission. Available at: https://eurydice.eacea.ec.europa.eu/eurypedia/czechia/overview [Accessed 11 February 2026]
Ferguson, R. (2012) ‘Learning Analytics: Drivers, Developments and Challenges’, International Journal of Technology Enhanced Learning, Vol. 4, No. 5/6, pp. 304–317. https://doi.org/10.1504/IJTEL.2012.051816
Flegl, M., Vltavská, K. and Acero, A. (2025) ‘Towards evaluation of the Czech primary education and its effect on civic engagement and governance’, in: Proceedings of the 43rd International Conference on Mathematical Methods in Economics (MME 2025), Zlín, Czech Republic, pp. 192–197.
Han, S., Williamson, B. D. and Fong, Y. (2021) ‘Improving random forest predictions in small datasets from two-phase sampling designs’, BMC Medical Informatics and Decision Making, Vol. 21, No. 1, p. 322. https://doi.org/10.1186/s12911-021-01688-3
Hauschildt, K., Gwosc, C., Schirmer, H., Mandl, S. and Menz, C. (2024) Social and economic conditions of student life in Europe: Eurostudent 8 synopsis of indicators 2021–2024, Bielefeld: wbv Media. https://doi.org/10.3278/6001920ew
Hawkins, D. M. (2004) ‘The problem of overfitting’, Journal of Chemical Information and Computer Sciences, Vol. 44, No. 1, pp. 1–12. https://doi.org/10.1021/ci0342472
Herodotou, C., Rienties, B., Boroeca, A., Zdrahal, Z. and Hlosta, M. (2019) ‘A Large-scale Implementation of Predictive Learning Analytics in Hospitality and Healthcare Courses’, Educational Technology Research and Devwlopment, Vol. 67, No. 5, pp. 1273–1306. https://doi.org/10.1007/s11423-019-09685-0
Hussain, M., Zhu, W., Zhang, W. and Abidi, S. M. R. (2018) ‘Student Engagement Predictions in an e-Learning System and Their Impact on Student Course Assessment Scores’, Computational Intelligence and Neuroscience, Vol. 2018, No. 1, p. 6347186. https://doi.org/10.1155/2018/6347186
Jafari, A., Aghsami, A. and Rabbani, M. (2025) ‘Selecting the best way to forecast income in the banking industry using data mining methods, a case study’, OPSEARCH, Vol. 62, No. 3, pp. 1383–1422. https://doi.org/10.1007/s12597-024-00852-3
Jiang, X., Du, Y. and Zheng, Y. (2024) ‘Evaluation of physical education teaching effect using Random Forest model under artificial intelligence’, Heliyon, Vol. 10, No. 1, e23576. https://doi.org/10.1016/j.heliyon.2023.e23576
Khan, S., Mazhar, T., Shahzad, T., Khan, M.A., Waheed, W., Waheed, A. and Hamam, H. (2025) ‘Predictive analytics in education- enhancing student achievement through machine learning’, Social Sciences & Humanities Open, Vol. 12, p. 101824. https://doi.org/10.1016/j.ssaho.2025.101824
Kotsiantis, S. B. (2012) ‘Use of machine learning techniques for educational proposes: a decision support system for forecasting students’ grades’, Artificial Intelligence Review, Vol. 37, No. 4, pp. 331–344. https://doi.org/10.1007/s10462-011-9234-x
Lourens, A. and Bleazard, D. (2016) ‘Applying predictive analytics in identifying students at risk: A case study’, South African Journal of Higher Education, Vol. 30, No. 2, pp. 129–150. https://doi.org/10.20853/30-2-583
Lundberg, S. M., Erion, G., Chen, H., DeGrave, A., Prutkin, J. M., Nair, B., Katz, R., Himmelfarb, J., Bansal, N. and Lee, S. I. (2020) ‘From Local Explanations to Global Understanding with Explainable AI for Trees’, Nature machine intelligence, Vol. 2, No. 1, pp. 56–67. https://doi.org/10.1038/s42256-019-0138-9
Lundberg, S. M. and Lee, S. I. (2017) ‘A unified approach to interpreting model predictions’, Advances in Neural Information Processing Systems, Vol. 30, pp. 4765–4774. https://doi.org/10.48550/ARXIV.1705.07874
Mazouch, P. and Fischer, J. (2024) Více času na pedagogické vedení školy prostřednictvím efektivního zajištění nepedagogických činností [More time for pedagogical management of the school through effective provision of non-pedagogical activities], Prague: Prague University of Economics and Business. Available at: https://partnerstvi2030.cz/wp-content/uploads/Vice_casu_na_pedagogicke_vedeni_skoly_VSE.pdf [Accessed 3 March 2026]
MEYS (2020) Strategy for the education policy of the Czech Republic up to 2030+, Prague: Ministry of Education, Youth and Sports. Available at: https://msmt.gov.cz/uploads/brozura_S2030_en_fin_online.pdf [Accessed 2 February 2026]
Molnar, C. (2022) Interpretable Machine Learning: A Guide for Making Black Box Models Explainable, Self-Published. https://christophm.github.io/interpretable-ml-book/
Nieuwenhuis, J. and Hooimeijer, P. (2016) ‘The association between neighbourhoods and educational achievement, a systematic review and meta-analysis’, Journal of Housing and the Built Environment, Vol. 31, No. 2, pp. 321–347. https://doi.org/10.1007/s10901-015-9460-7
Nouri, J., Ebner, M., Ifenthaler, D., Saqr, M., Malmberg, J., Khalil, M., Bruun, J., Viberg, O., González, M. Á. C., Papamitsiou Z. and Berthelsen, U. D. (2019) ‘Efforts in Europe for Data-Driven Improvement of Education: A Review of Learning Analytics Research in Seven Countries’, International Journal of Learning Analytics and Artificial Intelligence for Education (iJAI), Vol. 1, No. 1, pp. 8–27. https://doi.org/10.3991/ijai.v1i1.11053
OECD (2020) Education policy outlook in the Czech Republic, OECD Education Policy Perspectives, No. 11, Paris: OECD Publishing. https://doi.org/10.1787/6363ab1d-en
OECD (2023a) Education at a Glance 2023: OECD Indicators, Paris: OECD Publishing. https://doi.org/10.1787/e13bef63-en
OECD (2023b) PISA 2022 results (Volume I): The state of learning and equity in education, Paris: OECD Publishing. https://doi.org/10.1787/53f23881-en
OECD (2025a) OECD economic surveys: Czechia 2025, Paris: OECD Publishing. https://doi.org/10.1787/7a70af5c-en
OECD (2025b) Education at a Glance 2025: OECD Indicators, Paris: OECD Publishing. https://doi.org/10.1787/1c0d9c79-en
PAQresearch (2026) Mapa vzdělávání [Education map], Available at: https://mapavzdelavani.cz/ [Accessed 26 January 2026]
Rabelo, A. M. and Zárate, L. E. (2025) ‘A model for predicting dropout of higher education students’, Data Science and Management, Vol. 8, No. 1, pp. 72–85. https://doi.org/10.1016/j.dsm.2024.07.001
Romero, C. and Ventura, S. (2010) ‘Educational Data Mining: A Review of the State of the Art’, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), Vol. 40, No. 6, pp. 601–618. https://doi.org/10.1109/TSMCC.2010.2053532
SGI (2024) Sustainable governance indicators – Czech Republic, Bertelsmann Stiftung. Available at: https://www.sgi-network.org/2024/Czechia [Accessed 11 February 2026]
Shafiq, D. A., Marjani, M., Habeeb, R. A. A. and Asirvatham, D. (2022) ‘Student Retention Using Educational Data Mining and Predictive Analytics: A Systematic Literature Review’, IEEE Access, Vol. 10, pp. 72480–72503. https://doi.org/10.1109/ACCESS.2022.3188767
Shapley, L. (1953) ‘A value for n-person games’, in: Kuhn, H. and Tucker, A. (eds.), Contributions to the Theory of Games II, Princeton: Princeton University Press, pp. 307–317. https://doi.org/10.1515/9781400881970-018
Simonová, N. and Soukup, P. (2015) ‘Impact of primary and secondary social origin factors on the transition to university in the Czech Republic’, British Journal of Sociology of Education, Vol. 36, No. 5, pp. 707–728. https://doi.org/10.1080/01425692.2013.854690
Song, Q., Liu, Y. and Tan, C. Y. (2025) ‘Effects of Family Socioeconomic Status on Educational Outcomes in Primary and Secondary Education: A Systematic Review of the Causal Evidence’, Educational Psychology Review, Vol. 37, No. 29. https://doi.org/10.1007/s10648-025-10004-8
Šťastný, V. (2023) ‘Shadow education in the context of early tracking: between-track differences in the Czech Republic’, Compare: A Journal of Comparative and International Education, Vol. 53, No. 3, pp. 380–398. https://doi.org/10.1080/03057925.2021.1922271
Tan, C. Y. (2024) ‘Socioeconomic Status and Student Learning: Insights from an Umbrella Review’, Educational Psychology Review, Vol. 36, No. 4. https://doi.org/10.1007/s10648-024-09929-3
Tsai, Y. and Gašević, D. (2017) ‘Learning analytics in higher education - challenges and policies: A review of eight learning analytics policies’, in: Proceedings of the Seventh International Learning Analytics & Knowledge Conference (LAK 2017), pp. 233–242. https://doi.org/10.1145/3027385.3027400
Umer, R., Susnjak, T., Mathrani, A. and Suriadi, L. (2023) ‘Current stance on predictive analytics in higher education: opportunities, challenges and future directions’, Interactive Learning Environments, Vol. 31, No. 6, pp. 3503–3528. https://doi.org/10.1080/10494820.2021.1933542
Veerman, G. J. and Denessen, E. (2021) ‘Social cohesion in schools: A non-systematic review of its conceptualization and instruments’, Cogent Education, Vol. 8, No. 1, pp. 1–14. https://doi.org/10.1080/2331186X.2021.1940633
Xu, Y. (2020) ‘Foreclosed American Dream? Parental Foreclosure and Young Adult Children’s Homeownership’, Journal of Family and Economic Issues, Vol. 41, No. 3, pp. 458–471. https://doi.org/10.1007/s10834-020-09665-0
Zhang, Y. and Yang, Q. (2022) ‘A Survey on Multi-Task Learning’, IEEE Transactions on Knowledge and Data Engineering, Vol. 34, No. 12, pp. 5586–5609. https://dx.doi.org/10.1109/TKDE.2021.3070203
Zuluaga, R., Camelo-Guarín, A. and De La Hoz, E. (2023) ‘Assessing the Relative Impact of Colombian Higher Education Institutions Using Fuzzy Data Envelopment Analysis (Fuzzy-DEA) in State Evaluations’, Journal on Efficiency and Responsibility in Education and Science, Vol. 16, No. 4, pp. 299–312. http://dx.doi.org/10.7160/eriesj.2023.160404
Additional Files
Published
How to Cite
Issue
Section
Categories
License
Copyright (c) 2026 Martin Flegl, Marketa Matulova, Kristyna Vltavska

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors declare with this manuscript intended for publication to ERIES Journal that:
- all co-authors agree with the publication of the manuscript even after amendments arising from peer review;
- all co-authors agree with the posting of the full text of this work on the web page of ERIES Journal and to the inclusion of references in databases accessible on the internet;
- no results of other researchers were used in the submitted manuscript without their consent, proper citation, or acknowledgement of their cooperation or material provided;
- the results (or any part of them) used in the manuscript have not been sent for publication to any other journal nor have they already been published (or if so, that the relevant works are cited in this manuscript);
- submission of the manuscript for publication was completed in accordance with the publishing regulations pertaining to place of work;
- experiments performed comply with current laws and written consent of the Scientific Ethics Committee / National Animal Care Authority (as is mentioned in the manuscript submitted);
- grant holders confirm that they have been informed of the submitted manuscript and they agree to its publication.
Authors retain copyright and grant ERIES Journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the published work with an acknowledgement of its initial publication in ERIES Journal. Moreover, authors are able to post the published work in an institutional repository with an acknowledgement of its initial publication in ERIES Journal. In addition, authors are permitted and encouraged to post the published work online (e.g. institutional repositories or on their website) as it can lead to productive exchanges, as well as earlier and greater citation of published work.



