Presentation Information

[3O2-IS-3-01]HoAstBench:A Comprehensive Benchmark for LLM Agents in Smart Homes

〇DONG PEIZHE1, TRAN Vu Duc1 (1. Japan Advanced Institute of Science and Technology)
regular,[[online]]

Keywords:

AI,Agent,Smart Home

Integrating large language models (LLMs) into smart home systems holds significant promise for substantially enhancing the interacting capabilities between users and smart devices. While existing researches have begun to incorporate LLMs into smart home assistants, these efforts remain largely confined to processing relatively simple and straightforward user commands. In contrast, real-world environments present far greater capabilities to interpret potentially ambiguous or invalid user commands, handle multi-step tasks, and maintain smooth multi-round conversation context. Furthermore, practical deployment of LLMs faces other critical challenges, including the inherent unpredictability of LLM outputs, high inference latency, and substantially high API costs.

To comprehensively evaluate the capabilities of LLM-integrated smart home assistants under such realistic and complex conditions, we introduce HoAstBench, a benchmark dataset constructed from common command patterns observed in real-world smart home interactions. We conduct extensive experimental evaluations across multiple mainstream LLMs. Our results demonstrate that even the most advanced models today still exhibit notable limitations—particularly in handling complex, multi-device, or invalid user commands. This highlights substantial room for improvement in this emerging application area.