AI-curated embeddings: A semantic approach to structuring and indexing vector databases

Arunkumar Medisetty

doi:10.33545/26633582.2023.v5.i2a.182

Subscribe Print Journal

Journal's Code

P-ISSN: 2663-3582
E-ISSN: 2663-3590

Important Information

Toll Free: 1800-1234070
Working hours 10:00 AM-06:00 PM

Issue Bar

Past Issue

Side Bar

Downloads

Identifier

2023, Vol. 5, Issue 2, Part A

AI-curated embeddings: A semantic approach to structuring and indexing vector databases

Author(s): Arunkumar Medisetty

Abstract:

The explosion of unstructured and multimodal data has intensified the demand for intelligent systems capable of organizing, filtering, and retrieving high-dimensional embeddings with precision and speed. While vector databases have emerged as scalable backbones for semantic search across text, images, and clinical data, their effectiveness is fundamentally limited by the quality and contextual relevance of the ingested embeddings. This paper introduces a novel AI-driven semantic curation framework that redefines vector preprocessing through a fusion of transformer-based language models, contrastive learning, and dynamic clustering strategies.
Our pipeline goes beyond conventional ingestion by applying zero-shot semantic tagging, transformer encoding, and embedding refinement to ensure that only contextually salient, high-utility vectors are indexed. Evaluations across three critical domains e-commerce, legal retrieval, and clinical informatics demonstrate significant real-world gains: A 23% boost in top-5 precision and 17% reduction in index size for product search; over 30% improvement in nDCG@10 and enhanced topic coherence for legal documents; and in clinical data, a 40% drop in irrelevant matches with improved recall of meaningful records.
Visual analyses using t-SNE and UMAP show that post-curation embeddings form denser, better-separated clusters, directly correlating with retrieval performance. Additionally, the framework achieves up to a 20% latency reduction in semantic search, underscoring its efficiency.
By embedding semantic intelligence at the data preparation layer, our framework transforms vector databases from passive storage systems into cognitively organized knowledge engines, setting a new paradigm for scalable, explainable, and high-performance AI-driven retrieval.

DOI: 10.33545/26633582.2023.v5.i2a.182

Pages: 57-61 | Views: 477 | Downloads: 233

Download Full Article: Click Here

International Journal of Engineering in Computer Science

How to cite this article:

Arunkumar Medisetty. AI-curated embeddings: A semantic approach to structuring and indexing vector databases. Int J Eng Comput Sci 2023;5(2):57-61. DOI: 10.33545/26633582.2023.v5.i2a.182

Impact Factor (RJIF): 5.52, P-ISSN: 2663-3582, E-ISSN: 2663-3590

2023, Vol. 5, Issue 2, Part A

AI-curated embeddings: A semantic approach to structuring and indexing vector databases

Related Links

Related Journal Subscription

Important Links