2023, Vol. 4, Issue 1, Part A
CleanRL: Reinforcement learning-driven framework for intelligent e-commerce log sanitization
Author(s): Yuvaraj Kavala
Abstract: E-commerce platforms continuously generate massive volumes of log data that encapsulate customer interactions, transactional records, and system events. However, these logs often suffer from significant data quality issues, including missing values, inconsistent formats, duplicate entries, and erroneous fields. Traditional rule-based or supervised learning methods struggle to adapt to the evolving nature of such logs, limiting their scalability and generalizability. In this study, we propose an intelligent data cleaning framework that formulates the problem as a sequential decision-making process within a reinforcement learning (RL) paradigm. The task is modeled as a Markov Decision Process (MDP), where an RL agent learns to take optimal cleaning actions-such as correction, deletion, or retention-guided by a composite reward function balancing accuracy, completeness, and correction cost. The framework employs a Deep Q-Network architecture trained on both a real-world clickstream dataset (RetailLog) containing over 2 million records and a synthetic dataset (ShopSim) with controlled error injection. Compared to a rule-based cleaner, supervised learning models, and the open-source DataPrep toolkit, our RL-based approach achieves superior performance, attaining an F1-score of 0.89, correction accuracy of 84%, and a 78% coverage rate, while reducing average cleaning time to 19 seconds per batch. Ablation studies further highlight the importance of each reward component, and qualitative analyses reveal the agent’s ability to selectively clean impactful anomalies without overcorrection. These results establish reinforcement learning as a powerful, adaptive solution for automated data quality management in dynamic, large-scale e-commerce environments.
DOI: 10.33545/27076571.2023.v4.i1a.161Pages: 58-64 | Views: 383 | Downloads: 108Download Full Article: Click Here
How to cite this article:
Yuvaraj Kavala.
CleanRL: Reinforcement learning-driven framework for intelligent e-commerce log sanitization. Int J Comput Artif Intell 2023;4(1):58-64. DOI:
10.33545/27076571.2023.v4.i1a.161