Actionable Learning Analytics

Predicting University Performance Levels with Interpretable Machine Learning

Authors

DOI:

https://doi.org/10.7160/eriesj.2026.190102

Keywords:

learning analytics, xai, academic performance, shap, explainability

Abstract

Higher education institutions need timely, explainable tools to identify students at risk of low performance on large-scale examinations and to guide targeted academic support strategies. In response to this challenge, this study proposes an explainable machine learning framework to predict undergraduate students' performance levels in Colombia's SABER PRO examination. Using student background variables (e.g., gender, region, school type, parental education, and occupation) and SABER 11 standardised test scores (Critical Reading, Mathematics, Citizenship Skills, Science, and English), we formulate a binary classification problem that distinguishes desirable outcomes (levels 3–4) from non-desirable outcomes (levels 1–2). We benchmark baseline models against non-linear learners, including XGBoost, GLMNET, SVM, DT, and LDA, using a 10-fold cross-validation protocol with systematic hyperparameter tuning. Model performance is assessed through confusion matrices and AUC scores. To support educational decision-making, we complement predictive results with explainability analyses, including global feature importance and individual-level explanations via SHAP, enabling transparent identification of the key drivers behind performance levels. The proposed approach provides actionable learning analytics to guide early academic support, promote responsible and transparent educational decision-making, and improve the likelihood of desirable SABER PRO achievement.

References

Agasisti, T., Egorov, A. and Serebrennikov, P. (2023) ‘Universities’ efficiency and the socioeconomic characteristics of their environment-evidence from an empirical analysis’, Socio-Economic Planning Sciences, Vol. 85, p. 101445. https://doi.org/10.1016/j.seps.2022.101445

Ahmed, E. (2024) ‘Student Performance Prediction Using Machine Learning Algorithms’, Applied Computational Intelligence and Soft Computing, Vol. 2024, No. 1, p. 4067721. https://doi.org/10.1155/2024/4067721

Ahmed, W., Wani, M. A., Pławiak, P., Meshoul, S., Mahmoud, A. and Hammad, M. (2025) ‘Machine learning-based academic performance prediction with explainability for enhanced decision-making in educational institutions’, Scientific Reports, Vol. 15, No. 1, p. 26879. https://doi.org/10.1038/s41598-025-12353-4

Alalawi, K., Athauda, R. and Chiong, R. (2024a) ‘An Extended Learning Analytics Framework Integrating Machine Learning and Pedagogical Approaches for Student Performance Prediction and Intervention’, International Journal of Artificial Intelligence in Education, Vol. 35, Né. 3, pp. 1239–1287. https://doi.org/10.1007/s40593-024-00429-7

Alalawi, K., Athauda, R., Chiong, R. and Renner, I. (2024b) ‘Evaluating the student performance prediction and action framework through a learning analytics intervention study’, Education and Information Technologies, Vol. 30, No. 3, pp. 2887–2916. https://doi.org/10.1007/s10639-024-12923-5

Alonso, J. M. and Casalino, G. (2019) ‘Explainable artificial intelligence for human-centric data analysis in virtual learning environments’, in: Burgos, D., Cimitile, M., Ducange, P., Pecori, R., Picerno, P., Raviolo, P. and Stracke, C. M. (eds.), Higher Education Learning Methodologies and Technologies Online, Cham: Springer International Publishing, pp. 125–138. https://doi.org/10.1007/978-3-030-31284-8_10

Alsariera, Y. A., Baashar, Y., Alkawsi, G., Mustafa, A., Alkahtani, A. A. and Ali, N. (2022) ‘Assessment and Evaluation of Different Machine Learning Algorithms for Predicting Student Performance’, Computational Intelligence and Neuroscience, Vol. 2022, p. 4151487. https://doi.org/10.1155/2022/4151487

Berens, J., Schneider, K., Gortz, S., Oster, S. and Burghoff, J. (2019) ‘Early Detection of Students at Risk – Predicting Student Dropouts Using Administrative Student Data from German Universities and Machine Learning Methods’, Journal of Educational Data Mining, Vol. 11, No. 3, pp. 1–41. https://doi.org/10.5281/zenodo.3594771

Cerdeira, J. M., Nunes, L. C., Reis, A. B. and Seabra, C. (2018) ‘Predictors of student success in Higher Education: Secondary school internal scores versus national exams’, Higher Education Quarterly, Vol. 72, No. 4, pp. 304–313. https://doi.org/10.1111/hequ.12158

Chen, J., Zhou, X., Yao, J. and Tang, S.-K. (2025) ‘Application of machine learning in higher education to predict students’ performance, learning engagement and self-efficacy: a systematic literature review’, Asian Education and Development Studies, Vol. 14, No. 2, pp. 205–240. https://doi.org/10.1108/aeds-08-2024-0166

Chinta, R., Kebritchi, M. and Ellias, J. (2016) ‘A conceptual framework for evaluating higher education institutions’, International Journal of Educational Management, Vol. 30, No. 6, pp. 989–1002. https://doi.org/10.1108/ijem-09-2015-0120

Delahoz-Dominguez, E., Zuluaga, R. and García-Yerena, C. E. (2025) ‘Evaluación predictiva de las habilidades en razonamiento cuantitativo en ingeniería’, Magis: Revista Internacional de Investigación en Educación, Vol. 18, pp. 117. https://doi.org/10.11144/Javeriana.m18.ehrc

Delahoz-Domínguez, E. J. and Hijón-Neira, R. (2024) ‘Recommender System for University Degree Selection: A Socioeconomic and Standardised Test Data Approach’, Applied Sciences, Vol. 14, No. 18, p. 8311. https://doi.org/10.3390/app14188311

Delahoz-Domínguez, E. J. and Hijón-Neira, R. (2025) ‘SAIL-Y: A Socioeconomic and Gender-Aware Career Recommender System’, Electronics, Vol. 14, No. 20, p. 4121. https://doi.org/10.3390/electronics14204121

Diaz Lema, M., Vooren, M., Cannistrà, M. van Klaveren, C., Agasisti, T. and Cornelisz, I. (2024) ‘Predicting dropout in Higher Education across borders’, Studies in Higher Education, Vol. 49, No. 1, pp. 141–156. https://doi.org/10.1080/03075079.2023.2224818

Domínguez-Jiménez, J. A., Campo-Landines, K. C., Martínez-Santos, J. C., Delahoz, E. J. and Contreras-Ortiz, S. H. (2020) ‘A machine learning model for emotion recognition from physiological signals’, Biomedical Signal Processing and Control, Vol. 55, p. 101646. https://doi.org/10.1016/j.bspc.2019.101646

Durairaj, M. and Vijitha, C. (2014) ‘Educational data mining for prediction of student performance using clustering algorithms’, International Journal of Computer Science and Information Technologies, Vol. 5, No. 4, pp. 5987–5991. Available at: https://www.semanticscholar.org/paper/Educational-Data-mining-for-Prediction-of-Student-Durairaj-Vijitha/892b0182c44c34a2ae68daec819eaec301c3bd9c

Edmond, U. V., Sada, S. M. and Osijirin, A. N. (2025) ‘Predictive Analytics Using Machine Learning Models on Undergraduate Students’ Performance of the Federal University of Allied Health Sciences, Enugu, Nigeria in Introduction to Computing Science’, Saudi Journal of Engineering and Technology, Vol. 10, No. 7, pp. 324–332. https://doi.org/10.36348/sjet.2025.v10i07.004

El-Kenawy, E. M., Alharbi, A. H., Alhussan, A., Eid, M. M., Sobhi, M. and Khafaga, D. S. (2025) ‘Optimizing student performance prediction through feature selection and machine learning models’, in: 2025 International Telecommunications Conference (ITC-Egypt), pp. 226–231. https://doi.org/10.1109/itc-egypt66095.2025.11186576

Fawcett, T. (2006) ‘An introduction to ROC analysis’, Pattern Recognition Letters, Vol. 27, No. 8, pp. 861–874. https://doi.org/10.1016/j.patrec.2005.10.010

Guerra, A. L. and Braungart-Rieker, J. M. (1999) ‘Predicting Career Indecision in College Students: The Roles of Identity Formation and Parental Relationship Factors’, The Career Development Quarterly, Vol. 47, No. 3, pp. 255–266. https://doi.org/10.1002/j.2161-0045.1999.tb00735.x

Guevara-Reyes, R., Ortiz-Garcés, I., Andrade, R. O., Cox-Riquetti, F. and Villegas-Ch, W. (2025) ‘Machine learning models for academic performance prediction: interpretability and application in educational decision-making’, Frontiers in Education, Vol. 10, p. 1632315. https://doi.org/10.3389/feduc.2025.1632315

Guo, R. and Ye, M. (2025) ‘Input-output efficiency, productivity dynamics, and determinants in western China’s higher education: A three-stage DEA, global Malmquist index, and Tobit model approach’, PLOS One, Vol 20, No. 6, p. e0325901. https://doi.org/10.1371/journal.pone.0325901

Hanley, J. A. and McNeil, B. J. (1982) ‘The meaning and use of the area under a receiver operating characteristic (ROC) curve’, Radiology, Vol. 143, No. 1, pp. 29–36. https://doi.org/10.1148/radiology.143.1.7063747

Hasan, R., Palaniappan, S., Mahmood, S., Abbas, A., Sarker, K. U. and Sattar, M. (2020) ‘Predicting Student Performance in Higher Educational Institutions Using Video Learning Analytics and Data Mining Techniques’, Applied Sciences, Vol. 10, No. 11, p. 3894. https://doi.org/10.3390/app10113894

Horn, A. S. and Lee, G. (2019) ‘Evaluating the Accuracy of Productivity Indicators in Performance Funding Models’, Educational Policy, Vol. 33, No. 5, pp. 702–733. https://doi.org/10.1177/0895904817719521

Jaber, M. H., Dafallah, I. A., Mohammed, A. Y., Eltahir Eltahir, R., Mohamed, M. A., Mohamed, T. A., Mudawi, M. H., Tayfour, D. O., Mohammed Ali, S. E. M., Ahmed, E. A. M., Osman, A. M., Kakoum, H. A., Bagadi, M. M. and Mohammed, A. O. (2024) ‘Socioeconomic disparities and their effect on medical student academic attainment Sudanese Universities’, BMC Medical Education, Vol. 24, No. 1, 929. https://doi.org/10.1186/s12909-024-05867-4

Johora, F. T., Hasan, M. N., Rajbongshi, A., Ashrafuzzaman, M. and Akter, F. (2025) ‘An explainable AI-based approach for predicting undergraduate students academic performance’, Array, Vol. 26, p. 100384. https://doi.org/10.1016/j.array.2025.100384

Khalil, M., Prinsloo, P. and Slade, S. (2023) ‘Fairness, Trust, Transparency, Equity, and Responsibility in Learning Analytics’, Journal of Learning Analytics, Vol. 10, No. 1, pp. 1–7. https://doi.org/10.18608/jla.2023.7983

Koirala, N., Koirala, D., Nyiwul, L. and Hu, Z. (2024) ‘Economic uncertainty, households’ credit situations, and higher education’, Journal of Macroeconomics, Vol. 80, p. 103598. https://doi.org/10.1016/j.jmacro.2024.103598

Kolo, K. D., Adepoju Solomon A. and Alhassan, J. K. (2015) ‘A Decision Tree Approach for Predicting Students Academic Performance’, International Journal of Education and Management Engineering, Vol. 5, No. 5, pp. 12–19. https://doi.org/10.5815/ijeme.2015.05.02

Kutlu, M. and Özer, H. (2024) T’he Effect of Economic and Social Inequalities on Academic Success in Türkiye: Evidence from the Classical and Bayesian Discrete Choice Models’, Prague Economic Papers, Vol. 33, No. 3, pp. 336–356. https://doi.org/10.18267/j.pep.860

Lamichhane, S., Eğilmez, G., Gedik, R., Bhutta, M. K. S. and Erenay, B. (2021) ‘Benchmarking OECD countries’ sustainable development performance: A goal-specific principal component analysis approach’, Journal of Cleaner Production, Vol. 287, p. 125040. https://doi.org/10.1016/j.jclepro.2020.125040

Long, P. and Siemens, G. (2014) ‘Penetrating the fog: analytics in learning and education’, Italian Journal of Educational Technology, Vol. 22, No. 3, pp. 132–137. https://doi.org/10.17471/2499-4324/195

Malik, S., Patro, S. G. K., Mahanty, C., Hegde, R., Naveed, Q. N., Lasisi, A., Buradi, A., Emma, A. F. and Kraiem, N. (2025) ‘Advancing educational data mining for enhanced student performance prediction: a fusion of feature selection algorithms and classification techniques with dynamic feature ensemble evolution’, Scientific Reports, Vol. 15, No. 1, p. 8738. https://doi.org/10.1038/s41598-025-92324-x

Martinez-Daza, M. A., Valencia-Quecano, L. I. and Guzmán-Rincón, A. (2024) ‘Conceptual Model for the Assessment of Academic Productivity in Research Seedbeds From a Systematic Review’, European Journal of Educational Research, Vol. 13, No. 2, pp. 813–833. https://doi.org/10.12973/eu-jer.13.2.813

Mathrani, A., Sušnjak, T., Ramaswami, G. and Barczak, A. (2021) ‘Perspectives on the Challenges of Generalizability, Transparency and Ethics in Predictive Learning Analytics’, Computers and Education Open, Vol. 2, p. 100060. https://doi.org/10.1016/j.caeo.2021.100060

Melo, E., Silva, I., Costa, D. G., Viegas, C. M. D. and Barros, T. M. (2022) ‘On the Use of eXplainable Artificial Intelligence to Evaluate School Dropout’, Education Sciences, Vol. 12, No. 12, p. 845. https://doi.org/10.3390/educsci12120845

Mena, N. P. and Bulla, J. F. A. (2022) ‘Socioeconomic conditions and academic performance in higher education in Colombia during the pandemic’, Quality in Higher Education, Vol. 29, No. 2, pp. 242–260. https://doi.org/10.1080/13538322.2022.2088564

Messalas, A., Kanellopoulos, Y. and Makris, C. (2019) ‘Model-agnostic interpretability with Shapley values’, in: 10th International Conference on Information, Intelligence, Systems and Applications (IISA), pp. 1–7. https://doi.org/10.1109/iisa.2019.8900669

Nezami, N., Haghighat, P., Gándara, D. and Anahideh, H. (2024) ‘Assessing Disparities in Predictive Modeling Outcomes for College Student Success: The Impact of Imputation Techniques on Model Performance and Fairness’, Education Sciences, Vol. 14, No. 2, p. 136. https://doi.org/10.3390/educsci14020136

Oyedotun, S. A., Ejenarhome, O. P. and Oise, G. (2025) ‘Learning Analytics and Predictive Modeling: Enhancing Student Success through Data-Driven Insights’, Journal of Science Research and Reviews, Vol. 2, No. 3, pp. 42–51. https://doi.org/10.70882/josrar.2025.v2i3.77

Pali, P. and Verma, S. (2024) ‘Predictive Analytics for Student Performance: A Machine Learning Model for Higher Education’, International Journal of Innovative Research in Computer and Communication Engineering, Vol. 12, No. 5, pp. 8151–8158. https://doi.org/10.15680/ijircce.2024.1205366

Parisineni, S. R. A. and Pal, M. (2023) ‘Enhancing trust and interpretability of complex machine learning models using local interpretable model agnostic shap explanations’, International Journal of Data Science and Analytics, Vol. 18, No. 4, pp. 457–466. https://doi.org/10.1007/s41060-023-00458-w

Parker, P. D., Schoon, I., Tsai, Y.-M., Nagy, G., Trautwein, U. and Eccles, J. S. (2012) ‘Achievement, agency, gender, and socioeconomic background as predictors of postschool choices: a multicontext study’, Developmental Psychology, Vol. 48, No. 6, pp. 1629–1642. https://doi.org/10.1037/a0029167

Pelima, L. R., Sukmana, Y. and Rosmansyah, Y. (2024) ‘Predicting University Student Graduation Using Academic Performance and Machine Learning: A Systematic Literature Review’, IEEE Access, Vol. 12, pp. 23451–23465. https://doi.org/10.1109/access.2024.3361479

Rets, I., Herodotou, C. and Gillespie, A. (2023) ‘Six Practical Recommendations Enabling Ethical Use of Predictive Learning Analytics in Distance Education’, Journal of Learning Analytics, Vol. 10, No.1, pp. 149–167. https://doi.org/10.18608/jla.2023.7743

Rhaiem, M. (2017) ‘Measurement and determinants of academic research efficiency: a systematic review of the evidence’, Scientometrics, Vol. 110, pp. 581–615. https://doi.org/10.1007/s11192-016-2173-1

Sangsawang, T. (2025) ‘Predicting Student Achievement Using Socioeconomic and School-Level Factors’, Artificial Intelligence in Learning. https://doi.org/10.63913/ail.v1i1.4

Sghir, N., Adadi, A. and Lahmer, M. (2022) ‘Recent advances in Predictive Learning Analytics: A decade systematic review (2012–2022)’, Education and Information Technologies, Vol. 28, No. 7, p. 8299–8333. https://doi.org/10.1007/s10639-022-11536-0

Sha, L., Gašević, D. and Chen, G. (2023) ‘Lessons from debiasing data for fair and accurate predictive modeling in education’, Expert Systems with Applications, Vol. 228, p. 120323. https://doi.org/10.1016/j.eswa.2023.120323

Sha, L., Raković, M., Das, A., Gašević, D. and Chen, G. (2022) ‘Leveraging Class Balancing Techniques to Alleviate Algorithmic Bias for Predictive Tasks in Education’, IEEE Transactions on Learning Technologies, Vol. 15, No. 4, pp. 481–492. https://doi.org/10.1109/tlt.2022.3196278

Siemens, G. (2019) ‘Learning analytics and open, flexible, and distance learning’, Distance Education, Vol. 40, No. 3, pp. 414–418. https://doi.org/10.1080/01587919.2019.1656153

Sirin, S. R. (2005) ‘Socioeconomic status and academic achievement: A meta-analytic review of research’, Review of Educational Research, Vol. 75, No. 3, pp. 417–453. https://doi.org/10.3102/00346543075003417

Sušnjak, T. (2024) ‘Beyond Predictive Learning Analytics Modelling and onto Explainable Artificial Intelligence with Prescriptive Analytics and ChatGPT’, International Journal of Artificial Intelligence in Education, Vol. 34, No. 2, pp. 452–482. https://doi.org/10.1007/s40593-023-00336-3

Talajić, M., Matijević, R. and Morić, Z. (2025) ‘Enhancing academic performance prediction through machine learning in cloud environments’, Edelweiss Applied Science and Technology., Vol. 9, No. 6, pp. 370–395. https://doi.org/10.55214/25768484.v9i6.7814

Umer, R., Sušnjak, T., Mathrani, A. and Suriadi, L. (2021) ‘Current stance on predictive analytics in higher education: opportunities, challenges and future directions’, Interactive Learning Environments, Vol. 31, No. 6, pp. 3503–3528. https://doi.org/10.1080/10494820.2021.1933542

Valdivia, A., Sánchez-Monedero, J. and Casillas, J. (2021) ‘How fair can we go in machine learning? Assessing the boundaries of accuracy and fairness’, International Journal of Intelligent Systems, Vol. 36, No. 4, pp. 1619–1643. https://doi.org/10.1002/int.22354

Wang, Y. and Panicker, C. M. V. (2025) ‘An Examination Regarding The Academic Performance Of University Students In Relation To Their Parents’ Socioeconomic Status In China’, Frontiers in Health Informatics, Vol. 13, No. 6, pp. 4527–4533. https://doi.org/10.63682/fhi2584

Yağcı, M. (2022) ‘Educational data mining: prediction of students’ academic performance using machine learning algorithms’, Smart Learning Environments, Vol. 9, No. 1. https://doi.org/10.1186/s40561-022-00192-z

Zeineddine, H., Braendle, U. and Farah, A. (2020) ‘Enhancing prediction of student success: Automated machine learning approach’, Computers & Electrical Engineering, Vol. 89, p. 106903. https://doi.org/10.1016/j.compeleceng.2020.106903

Additional Files

Published

2026-03-31

How to Cite

De La Hoz, E., Garcia-Yerena, C. . and Torres-Rojas, I. (2026) ’Actionable Learning Analytics: Predicting University Performance Levels with Interpretable Machine Learning’, Journal on Efficiency and Responsibility in Education and Science, vol. 19, no. 1, pp. 15–27. https://doi.org/10.7160/eriesj.2026.190102