DOI: https://doie.org/10.65985/APER.2026191953
Authors:Archana C, Sanju Gupta
Student Dropout Predication, XGBoost, Logistic Regression, Hybrid Machine Learning, Model, Ensemble Learn-ing, SHAP, LIME, SMOTE, Upsampling.
Student dropout is a significant socio-economic issue that impacts both the students’ personal growth and the school’s performance. Accurate and timely identification of potential student dropouts is crucial for providing the necessary support. This paper introduces the Optimized Hybrid Machine Learning approach, combining XGBoost and Logistic Regression using a weighted soft voting technique (70% XGBoost, 30% LogRegres-sion) for accurate student dropout predictive modeling. To over-come class imbalance, both Random Upsampling and SMOTE techniques were used. Dataset processing was done using label encoding for categorical variables and Z-normalization for nu-merical values. Feature selection was also conducted for selecting significant features. Various predictive models, including logistic Regression, XGBoost, and the stacked approach, were integrated for experimentation. Experimental results demonstrate that the proposed Optimized Hybrid Machine Learning Model achieved a high predictive performance in identifying student dropouts, with an accuracy of 96.95% and a weighted F1-Score of 0.96, outperforming baseline models. Using SHAP and LIME methods for Explainable AI, significant factors for student dropouts based on academic, student behavioral, and socio-economic factors were identified, thus providing a reliable, accurate, and executable approach to the issue of student dropouts.
Type: Journal
Language: English
Publisher: ya tai jing ji bian ji bu
ISSN: 1000-6052
Email: [email protected]