Red Paper
International Journal of Cloud Computing and Database Management

Impact Factor (RJIF): 5.4, P-ISSN: 2707-5907, E-ISSN: 2707-5915
Printed Journal   |   Refereed Journal   |   Peer Reviewed Journal
Peer Reviewed Journal

2025, Vol. 6, Issue 1, Part B

Intelligent data trust: A metadata-centric AI approach to scalable quality governance


Author(s): Arunkumar Medisetty

Abstract:
Ensuring high data quality (DQ) across large-scale, heterogeneous datasets remains a critical challenge in modern data ecosystems. Traditional rule-based DQ frameworks are often brittle, labour-intensive, and poorly suited for dynamic, schema-evolving environments. This paper presents a novel AI metadata-driven approach that leverages machine learning and metadata intelligence to automate the inference, validation, and enforcement of DQ rules across enterprise data pipelines. The proposed framework integrates five modular components: metadata profiling, AI-based rule generation, human-in-the-loop feedback, scalable rule execution, and continuous monitoring with drift detection. Metadata is harvested from sources like Apache Atlas and Hive Metastore to capture schema structure, lineage, and statistical patterns, which are then analysed using machine learning models—including decision trees and clustering algorithms—to generate candidate rules. These rules are validated with human feedback and enforced at scale using Spark and AWS Glue across both batch and streaming workloads.
A real-world prototype deployed on cloud-native infrastructure was evaluated on 15 datasets spanning finance, healthcare, and retail, totalling over 1.2 billion records. The system achieved 87% precision in auto-inferred rules, 60% reduction in manual rule authoring effort, and 45% improvement in anomaly detection compared to static rule baselines. Moreover, 93% of rules remained valid post-schema drift, demonstrating strong adaptability. Results also show execution times as low as 18-22 seconds per 10 million records, enabling real-time enforcement at scale. This research highlights the effectiveness of combining metadata automation with AI to enable scalable, adaptive, and resilient DQ governance, offering a reusable architecture for intelligent data quality management in enterprise environments.



DOI: 10.33545/27075907.2025.v6.i1b.92

Pages: 112-119 | Views: 119 | Downloads: 58

Download Full Article: Click Here

International Journal of Cloud Computing and Database Management
How to cite this article:
Arunkumar Medisetty. Intelligent data trust: A metadata-centric AI approach to scalable quality governance. Int J Cloud Comput Database Manage 2025;6(1):112-119. DOI: 10.33545/27075907.2025.v6.i1b.92
International Journal of Cloud Computing and Database Management

International Journal of Cloud Computing and Database Management

International Journal of Cloud Computing and Database Management
Call for book chapter
Journals List Click Here Research Journals Research Journals