CleanRL: Reinforcement learning-driven framework for intelligent e-commerce log sanitization

Yuvaraj Kavala

doi:10.33545/27076571.2023.v4.i1a.161

Subscribe Print Journal

Journal's Code

P-ISSN: 2707-6571
E-ISSN: 2707-658X

Important Information

Toll Free: 1800-1234070
Working hours 10:00 AM-06:00 PM

Issue Bar

Past Issue

Side Bar

Downloads

Identifier

2023, Vol. 4, Issue 1, Part A

CleanRL: Reinforcement learning-driven framework for intelligent e-commerce log sanitization

Author(s): Yuvaraj Kavala

Abstract: E-commerce platforms continuously generate massive volumes of log data that encapsulate customer interactions, transactional records, and system events. However, these logs often suffer from significant data quality issues, including missing values, inconsistent formats, duplicate entries, and erroneous fields. Traditional rule-based or supervised learning methods struggle to adapt to the evolving nature of such logs, limiting their scalability and generalizability. In this study, we propose an intelligent data cleaning framework that formulates the problem as a sequential decision-making process within a reinforcement learning (RL) paradigm. The task is modeled as a Markov Decision Process (MDP), where an RL agent learns to take optimal cleaning actions-such as correction, deletion, or retention-guided by a composite reward function balancing accuracy, completeness, and correction cost. The framework employs a Deep Q-Network architecture trained on both a real-world clickstream dataset (RetailLog) containing over 2 million records and a synthetic dataset (ShopSim) with controlled error injection. Compared to a rule-based cleaner, supervised learning models, and the open-source DataPrep toolkit, our RL-based approach achieves superior performance, attaining an F1-score of 0.89, correction accuracy of 84%, and a 78% coverage rate, while reducing average cleaning time to 19 seconds per batch. Ablation studies further highlight the importance of each reward component, and qualitative analyses reveal the agent’s ability to selectively clean impactful anomalies without overcorrection. These results establish reinforcement learning as a powerful, adaptive solution for automated data quality management in dynamic, large-scale e-commerce environments.

DOI: 10.33545/27076571.2023.v4.i1a.161

Pages: 58-64 | Views: 383 | Downloads: 108

Download Full Article: Click Here

International Journal of Computing and Artificial Intelligence

How to cite this article:

Yuvaraj Kavala. CleanRL: Reinforcement learning-driven framework for intelligent e-commerce log sanitization. Int J Comput Artif Intell 2023;4(1):58-64. DOI: 10.33545/27076571.2023.v4.i1a.161

Impact Factor (RJIF): 5.57, P-ISSN: 2707-6571, E-ISSN: 2707-658X

2023, Vol. 4, Issue 1, Part A

CleanRL: Reinforcement learning-driven framework for intelligent e-commerce log sanitization

Related Links

Related Journal Subscription

Important Links