The 40th Annual Conference of the Japanese Society for Artificial Intelligence, 2026

Presentation Information

3:30 PM - 3:45 PM JST(6:30 AM - 6:45 AM UTC)

[2E5-GS-10o-01]Prompting Vision–Language Models for Socially Compliant Robot Navigation

〇Ling Xiao¹, Toshihiko Yamasaki² (1. Hokkaido Univ., 2. Univ. of Tokyo)

Keywords:

Human-robot interaction,Socially compliant navigation,Vison language model

Language models are increasingly applied to social robot navigation, yet principled prompt design for socially compliant behavior remains underexplored, particularly for small vision–language models (VLMs) with limited decision-making capacity. Inspired by cognitive theories of learning and motivation, we analyze prompt design from two complementary dimensions: system guidance (action-focused, reasoning-oriented, and perception–reasoning prompts) and motivational framing, where models compete against humans, other AI systems, or their past selves. Experiments on the SNEI dataset reveal three findings. For fine-tuned small VLMs, competition against the model’s past self is most effective. Second, inappropriate system prompts can significantly degrade performance, even compared to direct finetuning. Third, while finetuning mainly improves semantic-level metrics, our prompt designs yield larger gains in action accuracy, indicating that they function primarily as decision-level constraints rather than representational enhancements.

Back to Session information