Presentation Information
[2Yin-B-11]Improving Large Language Model's Detectability via Synthetic Feedback
〇Hiroki Yamauchi1, Akira Kawabata1, Yuya Taguchi1, Hideaki Tamori1, Naoaki Okazaki2, Kentaro Inui3,4,5 (1. The Asahi Shimbun Company, 2. Institute of Science Tokyo, 3. MBZUAI, 4. Tohoku University, 5. RIKEN)
Keywords:
Verifier,LLM,FactCheck,Alignment,DPO
Large Language Models (LLMs) sometimes produce factually unsupported or incorrect content, often referred to as hallucinations. Prior work has primarily focused on preventing or reducing such errors, whereas we optimize generation to make hallucinations easier to detect. To improve this detectability at scale without additional human feedback, we propose an optimization method based on synthetic feedback generated by an LLM. Specifically, for each generated answer, we assign a detector-correctness label based on whether a detection model’s prediction matches the factuality label. Using both the factuality and detector-correctness signals, we fine-tune the LLM via preference optimization to improve detectability while minimizing degradation in factuality. Experiments on JSQuAD show that our method increases detectability relative to an unoptimized baseline, suggesting that synthetic-feedback-driven preference optimization can be a scalable approach to improving the detectability of hallucinations in LLM outputs.
