The 40th Annual Conference of the Japanese Society for Artificial Intelligence, 2026

Presentation Information

[2Yin-B-11]Improving Large Language Model's Detectability via Synthetic Feedback

〇Hiroki Yamauchi¹, Akira Kawabata¹, Yuya Taguchi¹, Hideaki Tamori¹, Naoaki Okazaki², Kentaro Inui^3,4,5 (1. The Asahi Shimbun Company, 2. Institute of Science Tokyo, 3. MBZUAI, 4. Tohoku University, 5. RIKEN)

Keywords:

Verifier,LLM,FactCheck,Alignment,DPO

Large Language Models (LLMs) sometimes produce factually unsupported or incorrect content, often referred to as hallucinations. Prior work has primarily focused on preventing or reducing such errors, whereas we optimize generation to make hallucinations easier to detect. To improve this detectability at scale without additional human feedback, we propose an optimization method based on synthetic feedback generated by an LLM. Specifically, for each generated answer, we assign a detector-correctness label based on whether a detection model’s prediction matches the factuality label. Using both the factuality and detector-correctness signals, we fine-tune the LLM via preference optimization to improve detectability while minimizing degradation in factuality. Experiments on JSQuAD show that our method increases detectability relative to an unoptimized baseline, suggesting that synthetic-feedback-driven preference optimization can be a scalable approach to improving the detectability of hallucinations in LLM outputs.

Back to Session information