The 40th Annual Conference of the Japanese Society for Artificial Intelligence, 2026

Presentation Information

2:15 PM - 2:30 PM JST(5:15 AM - 5:30 AM UTC)

[4K4-GS-6b-04]Examination of Quality Evaluation Criteria for Chit-Chat in Dialogue Systems and Attempt to Build a Quality Evaluation System

〇Shuichi Hirukawa¹, Yuya Goto¹, Makoto Shiomi¹, Shinji Shinjo¹, Shigeto Yoshida¹ (1. Sharp Corporation)

Keywords:

LLM,dialogue system,Japanese language,LLM-as-a-judge,casual conversation

Developing truly user-oriented dialogue systems requires methods that can quantitatively evaluate 'conversational preferability.' However, no comprehensive evaluation methodology has previously been established for Japanese. In this study, we developed an automatic evaluation system that assesses conversational preferability from multiple perspectives. Through a literature survey, we extracted 29 factors influencing human perception, which we classified into 9 fundamental factors, 13 user-dependent factors, and 7 system-dependent factors. Focusing especially on the 9 fundamental factors, we conducted subjective evaluation experiments to analyze their sensitivity and importance. Based on these results, we designed evaluation prompts for each of the 9 factors using LLM-as-a-judge and devised a scoring method to construct an automatic evaluation system. Evaluation of dialogue agent responses demonstrated a strong correlation with human judgments.
In future work, we plan to incorporate user-dependent factors to realize more human-oriented dialogue systems.

Back to Session information