2025, Vol. 6, Issue 2, Part C
An Intelligent Machine Learning Pipeline for Early Diabetes Prediction: CatBoost Ensemble with SMOTTEEN and Optuna Tuning
Author(s): Vishal Verma, Satish Kumar, Vandna Rani Verma and Alka Agrawal
Abstract: Diabetes has been recognized as one of the most widespread diseases worldwide. It arises when the level of glucose in the bloodstream exceeds normal levels. Prediction of diabetes?with high accuracy is crucial in the medical industry. Machine learning (ML) techniques play an essential role in building predictive models for healthcare analysis. This study proposes an ensemble approach based on the CatBoost algorithm for early diabetes prediction. To address class imbalance in the dataset, researchers employ the SMOTEENN hybrid sampling technique, and Optuna is utilized for automated hyperparameter tuning. The Diabetes Prediction Dataset (DPD) was preprocessed using data cleaning, IQR-based outlier removal, and label encoding before training. CatBoost was evaluated against several other ML algorithms, including KNN, RF, DT, XGBoost, ETC, LightGBM, and AdaBoost, showing better performance. The proposed hypertuned CatBoost model has shown 98.93% accuracy with better precision, recall, f1-score, and AUC-ROC. In the future, researchers will extend the model to other datasets for generalization and develop predictive models that enable the early detection and forecasting of diabetes progression at the individual patient level by uncovering patterns in clinically captured data.
DOI: 10.33545/27076571.2025.v6.i2c.202Pages: 229-237 | Views: 209 | Downloads: 122Download Full Article: Click Here
How to cite this article:
Vishal Verma, Satish Kumar, Vandna Rani Verma, Alka Agrawal.
An Intelligent Machine Learning Pipeline for Early Diabetes Prediction: CatBoost Ensemble with SMOTTEEN and Optuna Tuning. Int J Comput Artif Intell 2025;6(2):229-237. DOI:
10.33545/27076571.2025.v6.i2c.202