Drop-Out Prediction in Higher Education Using Imbalanced Multiclass Dataset

Main Article Content

Juan Antonio Contreras Montes
María Claudia Bonfante Rodríguez
María Andrea Chamorro


Introduction: High quality education has the potential to drive social change, promote equity and alleviate poverty. The prosperity of nations is closely linked to the calibre of their education systems. However, student attrition at the university level poses a major obstacle to mitigating social disparities. While many factors contribute to this phenomenon, leveraging machine learning and data analytics to identify influencing variables and predict potential student dropout is an effective approach to address this problem.

Objectives: To analyses risk-factors for attrition (drop out) of students at Higher Education Institutions and machine learning algorithms for early detection of such students that could benefit all the stakeholders.

Methods: The study used an unbalanced dataset from a higher education institution to build a classification model to predict academic dropout. The dataset was balanced using oversampling technique and tested using three machine learning algorithms: Random Forest (RF), Support Vector Machines (SVM) and Multinomial Logistic Regression (LR).

Results: The best result was achieved with RF model, with high values of recall, specificity, F1 and balanced accuracy for each of classes: Dropout, Enrolled and Graduate.

Conclusions: A total of 23 features were selected. With 80% of the balanced data, the training of three machine learning models was carried out. For the validation process, the remaining 20% of the data from the original (unbalanced) dataset was used. The results showed a high accuracy in two of the trained models: RM and SVM, with an overall accuracy higher than 0.93.

Article Details

How to Cite
Juan Antonio Contreras Montes, María Claudia Bonfante Rodríguez, & María Andrea Chamorro. (2023). Drop-Out Prediction in Higher Education Using Imbalanced Multiclass Dataset. Journal for ReAttach Therapy and Developmental Diversities, 6(10s(2), 1583–1591. https://doi.org/10.53555/jrtdd.v6i10s(2).1255
Author Biographies

Juan Antonio Contreras Montes, Zabud Technologies S.A.S

Department of Research and Development, Zabud Technologies S.A.S, Cartagena, Colombia

María Claudia Bonfante Rodríguez, Universidad del Sinú

Faculty of Engineering, Universidad del Sinú, Cartagena, Colombia

María Andrea Chamorro

Department of Research and Development, Zabud Technologies S.A.S, Cartagena, Colombia


Ahmad-Tarmizi, S.S., Mutalib, S., Abdul-Hamid, N.H., Abdul-Rahman, Sh. (2019). “A Review on Student Attrition in Higher Education Using Big Data Analytics and Data Mining Techniques”. International Journal of Modern Education and Computer Science. Vol. 8, pp. 1-14.

AlHashemi, Z. (2021). “Using Prediction ML algorithm for predicting early Student Attrition in Higher Education”. Master Thesis. Department of Graduate Programs & Research. Rochester Institute of Technology RIT, Dubai.

Aulck, L., Velagapudi, N., Blumenstock, J., West, J. (2016). “Predicting Student Dropout in Higher Education”. Proceedings of the 2016 ICML Workshop on #Data4Good: Machine Learning in Social Good Applications. New York, June 24, 2016.

Behr, A., Giese, M., Teguim, H.D., Theune, K. (2020). “Motives for dropping out from higher education – An analysis of bachelor’s degree students in Germany”, European Journal of Education, Vol. 56, pp. 325–343.

Bhandari, R. (2021). “Role of Higher Education in Poverty Reduction A Case Study of Tribhuvan University, Nepal”. Department of Education, Faculty of Educational Sciences, University of Oslo. Master Thesis.

Ceglédi, T., Fényes, H., Pusztai, G. (2022). “The Effect of Resilience and Gender on the Persistence of Higher Education Students”. Social Sciences 11: 93. https://doi.org/10.3390/socsci11030093

Chai, K.E., Gibson, D. (2015). “Predicting the Risk of Attrition for Undergraduate Students with Time Based Modelling”. Proceedings of the 12th International Conference on Cognition and Exploratory Learning in Digital Age (CELDA 2015). Maynooth, Greater Dublin, Ireland. October 2015.

Cherian, J., Jacob, J., Qureshi, R., Gaikar, V. (2020). “Relationship between Entry Grades and Attrition

Trends in the Context of Higher Education: Implication for Open Innovation of Education Policy”. Journal of Open Innovation: Technology, Market, and Complexity. Vo. 6, No. 199; doi:10.3390/joitmc6040199.

Cuizon, J.C. (2021). “Ensemble Predictive Model for Academic Churn Risk Using Plurality Voting”. Mindanao Journal of Science and Technology. Vol. 19, No. 1, pp. 224-235.

Guo, T., Bai, X., Tian, X., Firmin, S., Xia, F.(2022). “Educational Anomaly Analytics: Features, Methods, and Challenges”. Frontiers in Big Data, Vol. 4, pp. 1-16.

Guzmán, A., Barragán, S., Vitery, F.C. (2021). “Dropout in Rural Higher Education: A Systematic Review”. Frontiers in Education. 6:727833, doi: 10.3389/feduc.2021.727833

Khan, I., Ahmad, A.R., Jabeur, N., Mahdi, M.N. (2021). “An artificial intelligence approach to monitor student performance and devise preventive measures”. Smart Learning Environments. Vol. 8, No. 17, pp.1-18.

Martins, M.V., Tolledo, D., Machado, J., Baptista, L. M.T., Realinho, V. (2021) "Early prediction of student’s performance in higher education: a case study" Trends and Applications in Information Systems and Technologies, vol.1, in Advances in Intelligent Systems and Computing series. Springer. DOI: 10.1007/978-3-030-72657-7_16 This dataset is supported by program SATDAP - Capacitação da Administração Pública under grant POCI-05-5762-FSE-000191, Portugal.

Morison, A., Cowley, K. (2017). “An exploration of factors associated with student attrition and success in enabling programs”. Issues in Educational Research, Vol. 27, No. 2, pp. 330-346.

Müller, L., Klein, D. (2022). “Social Inequality in Dropout from Higher Education in Germany. Towards Combining the Student Integration Model and Rational Choice Theory”. Research in Higher Education https://doi.org/10.1007/s11162-022-09703-w.

Niyogisubizo, J., Liao, L., Nziyumva, E., Murwanashyaka, E., Nshimyumukiza, P.C. (2022). “Predicting student’s dropout in university classes using two-layer ensemble machine learning approach: A novel stacked generalization”. Computers and Education: Artificial Intelligence. Vol. 3. Article 100066, pp. 1-12.

Nurmalitasari, Long, Z.A., Noor, M,F.M. (2023). “Factors Influencing Dropout Students in Higher Education”. Education Research International, https://doi.org/10.1155/2023/7704142.

Opazo, D., Moreno, S., Álvarez-Miranda, E., Pereira, J. (2021). “Analysis of First-Year University Student Dropout through Machine Learning Models: A Comparison between Universities”. Mathematics, Vol. 9, No. 20, pp. 1-27.

Raisibe-Mathye, M. (2020). “A Theoretical Model to Predict Undergraduate Attrition Based on Background And Enrollment Characteristics”. Master Thesis. School of Computer Science and Applied Mathematics, University of the Witwatersrand, Johannesburg.

Realinho, V., Machado, J., Baptista, L., Martins, M.V. (2022). “Predicting Student Dropout and Academic Success”. Data, Vol. 7, No. 146. https://doi.org/10.3390/data7110146

Sani, N.S., Mohamed Nafuri, A.M., Othman, Z.A., Ahmad Nazri, M.Z., Mohamad, K.N. (2021). “Drop-Out Prediction in Higher Education Among B40 Students”. International Journal of Advanced Computer Science and Applications, Vol. 11, No. 11, pp 550-559

Wan Yaacob, W.F., Sobri, N.M., Md Nasir, S.A., Norshahidi, N.D., Wan Husin, W.Z. (2020). “Predicting Student Drop-Out in Higher Institution Using Data Mining Techniques”. Journal of Physics: Conference Series. 1496 012005