Red Paper
International Journal of Computing and Artificial Intelligence

Impact Factor (RJIF): 5.57, P-ISSN: 2707-6571, E-ISSN: 2707-658X
Printed Journal   |   Refereed Journal   |   Peer Reviewed Journal
Peer Reviewed Journal

2023, Vol. 4, Issue 1, Part A

CleanRL: Reinforcement learning-driven framework for intelligent e-commerce log sanitization


Author(s): Yuvaraj Kavala

Abstract: E-commerce platforms continuously generate massive volumes of log data that encapsulate customer interactions, transactional records, and system events. However, these logs often suffer from significant data quality issues, including missing values, inconsistent formats, duplicate entries, and erroneous fields. Traditional rule-based or supervised learning methods struggle to adapt to the evolving nature of such logs, limiting their scalability and generalizability. In this study, we propose an intelligent data cleaning framework that formulates the problem as a sequential decision-making process within a reinforcement learning (RL) paradigm. The task is modeled as a Markov Decision Process (MDP), where an RL agent learns to take optimal cleaning actions-such as correction, deletion, or retention-guided by a composite reward function balancing accuracy, completeness, and correction cost. The framework employs a Deep Q-Network architecture trained on both a real-world clickstream dataset (RetailLog) containing over 2 million records and a synthetic dataset (ShopSim) with controlled error injection. Compared to a rule-based cleaner, supervised learning models, and the open-source DataPrep toolkit, our RL-based approach achieves superior performance, attaining an F1-score of 0.89, correction accuracy of 84%, and a 78% coverage rate, while reducing average cleaning time to 19 seconds per batch. Ablation studies further highlight the importance of each reward component, and qualitative analyses reveal the agent’s ability to selectively clean impactful anomalies without overcorrection. These results establish reinforcement learning as a powerful, adaptive solution for automated data quality management in dynamic, large-scale e-commerce environments.

DOI: 10.33545/27076571.2023.v4.i1a.161

Pages: 58-64 | Views: 296 | Downloads: 91

Download Full Article: Click Here

International Journal of Computing and Artificial Intelligence
How to cite this article:
Yuvaraj Kavala. CleanRL: Reinforcement learning-driven framework for intelligent e-commerce log sanitization. Int J Comput Artif Intell 2023;4(1):58-64. DOI: 10.33545/27076571.2023.v4.i1a.161
International Journal of Computing and Artificial Intelligence

International Journal of Computing and Artificial Intelligence

International Journal of Computing and Artificial Intelligence
Call for book chapter
Journals List Click Here Research Journals Research Journals