DOI QR코드

DOI QR Code

Performance Comparison of Machine Learning based Prediction Models for University Students Dropout

머신러닝 기반 대학생 중도 탈락 예측 모델의 성능 비교

  • Received : 2023.08.30
  • Accepted : 2023.10.27
  • Published : 2023.12.31

Abstract

The increase in the dropout rate of college students nationwide has a serious negative impact on universities and society as well as individual students. In order to proactive identify students at risk of dropout, this study built a decision tree, random forest, logistic regression, and deep learning-based dropout prediction model using academic data that can be easily obtained from each university's academic management system. Their performances were subsequently analyzed and compared. The analysis revealed that while the logistic regression-based prediction model exhibited the highest recall rate, its f-1 value and ROC-AUC (Receiver Operating Characteristic - Area Under the Curve) value were comparatively lower. On the other hand, the random forest-based prediction model demonstrated superior performance across all other metrics except recall value. In addition, in order to assess model performance over distinct prediction periods, we divided these periods into short-term (within one semester), medium-term (within two semesters), and long-term (within three semesters). The results underscored that the long-term prediction yielded the highest predictive efficacy. Through this study, each university is expected to be able to identify students who are expected to be dropped out early, reduce the dropout rate through intensive management, and further contribute to the stabilization of university finances.

전국 대학생의 중도 탈락 비율의 증가는 학생 개인 뿐만 아니라 대학과 사회에 심각한 부정적 영향을 끼친다. 본 연구에서는 중도 탈락이 예상되는 학생을 사전에 식별하기 위하여, 각 대학의 학사관리 시스템에서 손쉽게 얻을 수 있는 학적 데이터를 기반으로 머신러닝 분야의 결정트리, 랜덤 포레스트, 로지스틱 회귀 및 딥러닝 기반의 중도 탈락 예측 모델을 구축하고, 그 성능을 비교·분석하였다. 분석 결과 로지스틱 회귀 기반 예측 모델의 재현율이 가장 높았으나 f-1 및 auc 값이 낮은 한계를 보였고, 랜덤 포레스트 기반의 예측 모델의 경우 재현율을 제외한 다른 모든 지표에서 가장 우수한 성능을 보였다. 또한 예측 기간에 따른 예측 모델의 성능을 확인하기 위하여 예측 기간을 단기(1개 학기 이내), 중기(2개 학기 이내) 및 장기(3개 학기 이내)로 나누어 분석해 본 결과, 장기 예측 시 가장 높은 예측력을 보였다. 본 연구를 통해 각 대학은 중도 탈락이 예상되는 학생들을 조기에 식별하고, 이들에 대한 집중 관리를 통해 중도 탈락 비율을 줄이며 나아가 대학 재정 안정화에 기여할 수 있을 것으로 기대된다.

Keywords

References

  1. Andrade, M. S., Miller, R. M., McArthur, D., & Ogden, M., "The impact of learning on student persistence in higher education", Journal of College Student Retention: Research, Theory & Practice, Vol. 24, No. 2., pp. 316-336. 2020.
  2. Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP., "SMOTE: Synthetic minority over-sampling technique", J Artif Intell Res, Vol. 16, pp. 321-57, 2002. https://doi.org/10.1613/jair.953
  3. Chung J. Y, M. S. Sun, and M. J. Jeong, "An Analysis of Institutional Factors Affecting on College Dropout Rates", Asian Journal of Education, vol. 16, no. 4, pp. 57-76, 2015. https://doi.org/10.15753/aje.2015.12.16.4.57
  4. Elrahman S.M.A. and A. Abraham, "A Review of Class Imbal-ance Problem", Journal of Network and Innovative Computing, Vol. 1, pp. 332-340, 2013.
  5. Han S., "Exploration of Factors that Affect College Student Drop-out and Resilience," Journal of Learner-Centered Curriculum and Instruction, Vol. 18, No. 24, pp. 1369-1391, 2018.
  6. Stjepan Picek, Annelie Heuser, Alan Jovic, Shivam Bhasin, and Francesco Regazzoni. "The curse of class imbalance and conflicting metrics with machine learning for side-channel evaluations", IACR Transactions on Cryptographic Hardware and Embedded Systems, Vol. 1, pp. 209-237. 2019. https://doi.org/10.46586/tches.v2019.i1.209-237
  7. Jeong, Do-Heon and Ju-Yeon Park, "Data Analysis of Dropouts of University Students Using Topic Modeling", Jornal of the Korea Institute of Information and Communication Engineering, Vol. 25, No. 1, pp.88-95, 2021.
  8. Jeong Do-Heon, "Implementation of a Machine Learning-based Recommender System for Preventing the University Students' Dropout", Journal of the Korea Convergence Society, Vol. 12, No. 10, pp.37-43, 2021.
  9. Jeong, Seon-Ho. "A Study on the Development of University Students Dropout Prediction Model Using Classification Technique." Journal of Convergence Consilience. Korea Safety Culture Institute, August 31, 2022.
  10. Kang M., E. Lee, and E. Lee, "Trends and influencing factors of college student's dropout intention", In Forum for Youth Culture, no. 58, pp. 5-30, 2019.
  11. Park C., "Development of Prediction Model to Improve Dropout of Cyber University", Journal of the Korea Academia-Industrial Cooperation Society, vol. 21, no. 7, pp. 380-390, 2020.
  12. Lee E. H. and S. Kang, "The Research Trends and Implications of College Dropouts in Korea", Journal of Learner-Centered Curriculum and Instruction, Vol. 19, No. 10, pp. 169-199, 2019. https://doi.org/10.22251/jlcci.2019.19.10.169
  13. Lee E., Y. Song, J. Kim, and S. Oh, "An Exploratory Study on Determinants Predicting the Dropout Rate of 4-year Universities Using Random Forest: Focusing on the Institutional Level Factors", Journal of Educational Technology, Vol. 36, No. 1, pp. 191-219, 2020. https://doi.org/10.17232/KSET.36.1.191
  14. Lee S. and L. Park, "Analysis of Correlation between the Characteristics of University Students and Dropout", Journal of Learner-Centered Curriculum and Instruction, Vol. 19, No. 11, pp. 1185-1210, 2019.
  15. Ohn Syng-Yup, Seung-Do, Chi and Mi-Young Han, "Feature Selection for Classification of Mass Spectrometric Proteomic Data Using Random Forest" Journal of the Korea Society for Simulation, Vol. 22, No. 4, pp.139-147, 2013 https://doi.org/10.9709/JKSS.2013.22.4.139
  16. Sang Bong Oh and Kun Chang Lee, "A Neural Network-Driven Decision Tree Classifier Approach to Time Series Identification", Journal of the Korea Society for Simulation, Vol. 5, No. 1, pp.1-12. 1996.