Red Paper
International Journal of Computing and Artificial Intelligence

Impact Factor (RJIF): 5.57, P-ISSN: 2707-6571, E-ISSN: 2707-658X
Printed Journal   |   Refereed Journal   |   Peer Reviewed Journal
Peer Reviewed Journal

2025, Vol. 6, Issue 2, Part C

An Intelligent Machine Learning Pipeline for Early Diabetes Prediction: CatBoost Ensemble with SMOTTEEN and Optuna Tuning


Author(s): Vishal Verma, Satish Kumar, Vandna Rani Verma and Alka Agrawal

Abstract: Diabetes has been recognized as one of the most widespread diseases worldwide. It arises when the level of glucose in the bloodstream exceeds normal levels. Prediction of diabetes?with high accuracy is crucial in the medical industry. Machine learning (ML) techniques play an essential role in building predictive models for healthcare analysis. This study proposes an ensemble approach based on the CatBoost algorithm for early diabetes prediction. To address class imbalance in the dataset, researchers employ the SMOTEENN hybrid sampling technique, and Optuna is utilized for automated hyperparameter tuning. The Diabetes Prediction Dataset (DPD) was preprocessed using data cleaning, IQR-based outlier removal, and label encoding before training. CatBoost was evaluated against several other ML algorithms, including KNN, RF, DT, XGBoost, ETC, LightGBM, and AdaBoost, showing better performance. The proposed hypertuned CatBoost model has shown 98.93% accuracy with better precision, recall, f1-score, and AUC-ROC. In the future, researchers will extend the model to other datasets for generalization and develop predictive models that enable the early detection and forecasting of diabetes progression at the individual patient level by uncovering patterns in clinically captured data.

DOI: 10.33545/27076571.2025.v6.i2c.202

Pages: 229-237 | Views: 209 | Downloads: 122

Download Full Article: Click Here

International Journal of Computing and Artificial Intelligence
How to cite this article:
Vishal Verma, Satish Kumar, Vandna Rani Verma, Alka Agrawal. An Intelligent Machine Learning Pipeline for Early Diabetes Prediction: CatBoost Ensemble with SMOTTEEN and Optuna Tuning. Int J Comput Artif Intell 2025;6(2):229-237. DOI: 10.33545/27076571.2025.v6.i2c.202
International Journal of Computing and Artificial Intelligence

International Journal of Computing and Artificial Intelligence

International Journal of Computing and Artificial Intelligence
Call for book chapter
Journals List Click Here Research Journals Research Journals