The 40th Annual Conference of the Japanese Society for Artificial Intelligence, 2026

Presentation Information

2:45 PM - 3:00 PM JST(5:45 AM - 6:00 AM UTC)

[5L3-OS-6b-04]Semantic Manifolds Are Low-Dimensional, But Retrieval Is NotRanking Stability in Dense Embeddings

〇Noriyuki Yamamoto¹ (1. GIG Intelligence Inc.)

Keywords:

semantic embedding,intrinsic dimension,manifold hypothesis,dense retrieval,retrieval-augmented generation

Embedding representations derived from large language models are widely used for semantic search and retrieval-augmented generation. Although they are often interpreted through the manifold hypothesis—that semantic meaning lies on a low-dimensional manifold—dimensionality reduction is known to degrade retrieval performance.
In this work, we show that this discrepancy arises from a difference between the geometric structure of semantic representations and the intrinsic requirements of retrieval tasks. By combining global dimensionality measures based on the participation ratio with local intrinsic dimension estimators such as TwoNN and Levina–Bickel MLE, we demonstrate that semantic freedom is governed by local geometric properties. We observe phase-transition-like behavior in ranking performance as the embedding dimension is reduced, and show that this phenomenon originates from insufficient resolution to discriminate semantically close items.
Our results provide a unified geometric explanation for why low-dimensional representations can preserve meaning, while effective retrieval requires higher dimensional embeddings, and offer insights for RAG system design.

Comment

To browse or post comments, you must log in.Log in

Back to Session information