Presentation Information

[3Yin-A-08]Improving Accuracy of VLM-based Elemental Motion Recognition using Therblig Motion Transitions

〇Toshiki Kotani1, Ryo Sakai1, Taiki Fuji1, Kentaro Yoshimura1 (1. Hitachi, Ltd.)

Keywords:

Vision Language Model,Elemental motion recognition,Therblig analysis

The labor shortage in manufacturing and maintenance has created an urgent need for robot automation. Since traditional robot teaching is time-consuming, technologies that enable the generation of robot actions directly from human work videos with minimal effort are essential. To achieve this, an elemental motion system that provides a unified description across different embodiments of humans and robots is effective. This study proposes a training-free elemental motion recognition framework using VLMs and Therblig analysis, and introduces a transition-triggered local introspection that references adjacent frames only when error-prone Therblig transitions are detected. Single-frame VLM inference often misidentifies preparatory motions as grasping actions due to hand-object overlap. Although continuous multi-frame inference could improve accuracy, it is computationally expensive. Experiments on inspection tasks show that our proposed method improves recognition accuracy while keeping computational costs low.