2025, Vol. 6, Issue 1, Part B
Intelligent data trust: A metadata-centric AI approach to scalable quality governance
Author(s): Arunkumar Medisetty
Abstract: Ensuring high data quality (DQ) across large-scale, heterogeneous datasets remains a critical challenge in modern data ecosystems. Traditional rule-based DQ frameworks are often brittle, labour-intensive, and poorly suited for dynamic, schema-evolving environments. This paper presents a novel AI metadata-driven approach that leverages machine learning and metadata intelligence to automate the inference, validation, and enforcement of DQ rules across enterprise data pipelines. The proposed framework integrates five modular components: metadata profiling, AI-based rule generation, human-in-the-loop feedback, scalable rule execution, and continuous monitoring with drift detection. Metadata is harvested from sources like Apache Atlas and Hive Metastore to capture schema structure, lineage, and statistical patterns, which are then analysed using machine learning models—including decision trees and clustering algorithms—to generate candidate rules. These rules are validated with human feedback and enforced at scale using Spark and AWS Glue across both batch and streaming workloads.
A real-world prototype deployed on cloud-native infrastructure was evaluated on 15 datasets spanning finance, healthcare, and retail, totalling over 1.2 billion records. The system achieved 87% precision in auto-inferred rules, 60% reduction in manual rule authoring effort, and 45% improvement in anomaly detection compared to static rule baselines. Moreover, 93% of rules remained valid post-schema drift, demonstrating strong adaptability. Results also show execution times as low as 18-22 seconds per 10 million records, enabling real-time enforcement at scale. This research highlights the effectiveness of combining metadata automation with AI to enable scalable, adaptive, and resilient DQ governance, offering a reusable architecture for intelligent data quality management in enterprise environments.
DOI: 10.33545/27075907.2025.v6.i1b.92Pages: 112-119 | Views: 119 | Downloads: 58Download Full Article: Click Here
How to cite this article:
Arunkumar Medisetty.
Intelligent data trust: A metadata-centric AI approach to scalable quality governance. Int J Cloud Comput Database Manage 2025;6(1):112-119. DOI:
10.33545/27075907.2025.v6.i1b.92