The 40th Annual Conference of the Japanese Society for Artificial Intelligence, 2026

Presentation Information

[4Yin-A-60]Development of a Dialogue System Capable of Real-Time Backchanneling

〇Sho Katsuki¹, Yoshinobu Kano¹ (1. Shizuoka University)

Keywords:

dialogue system

With the rapid advancement of Large Language Models, there is a growing demand for voice-based dialogue systems that can serve as everyday companions. However, conventional Voice Activity Detection (VAD) often misinterprets user fillers (e.g., "uhm...") or backchanneling as the end of an utterance, leading to unnatural system interruptions.
We adopted OpenAI's ChatGPT Realtime API and proposed a unique parallel architecture integrated with a "Judge Model" that determines in real-time whether to respond or wait.
Furthermore, to emulate a "cheerful friend" who engages in proactive self-disclosure, we introduced a "Three-Step Response Strategy" (Reaction + Self-disclosure + Closing).
Our proposed method effectively eliminated unnatural interruptions in scenarios containing fillers (interruption rate: 8.7\%).
While response latency increased to 2.6 seconds, the combination of the system's "Thoughtful Pause" and the empathetic self-disclosure strategy significantly improved user satisfaction.
Our approach successfully established a vibrant and natural rhythm in human-AI interaction.

Back to Session information