The 40th Annual Conference of the Japanese Society for Artificial Intelligence, 2026

Presentation Information

12:45 PM - 1:00 PM JST(3:45 AM - 4:00 AM UTC)

[5O2-IS-5a-04]Large Language Models Do Not Solve Hallucinations in Materials Knowledge: A Preliminary Analysis

〇Yutong Duan¹, Satoshi Kosugi¹, Manabu Okumura¹, Kotaro Funakoshi¹ (1. Insititute of Science Tokyo)

regular

Keywords:

Large Language Models,Materials Science,Numerical Hallucination

Identifier-conditioned numeric retrieval is a high-stakes use case for LLM-based scientific assistants, where outputs must be record-level correct. We evaluate structured property retrieval from the Materials Project (MP) database under five controlled settings across three models: GPT-5-mini, Gemini-2.5-flash-lite, and Llama-4 Scout-17B-16E. LLM-only baselines fail completely and exhibit structured hallucination patterns, including near-zero/default-value collapse for band gap. Tool grounding substantially improves accuracy, but remains brittle due to last-mile outputting and formatting errors; prompt-only mitigations are also unreliable. We propose Retrieve--Verify--Override (RVO), a lightweight correctness layer that deterministically verifies the model output against the retrieved MP record and overrides on any mismatch or parse failure. RVO achieves perfect record-level accuracy across model families in our benchmark, indicating that reliable numeric retrieval requires deterministic enforcement beyond prompting.

Comment

To browse or post comments, you must log in.Log in

Back to Session information