Presentation Information

[3Yin-A-08]Improving Accuracy of VLM-based Elemental Motion Recognition using Therblig Motion Transitions

〇Toshiki Kotani1, Ryo Sakai1, Taiki Fuji1, Kentaro Yoshimura1 (1. Hitachi, Ltd.)

Keywords:

Vision Language Model,Elemental motion recognition,Therblig analysis

The labor shortage in manufacturing and maintenance has created an urgent need for robot automation. Since traditional robot teaching is time-consuming, technologies that enable the generation of robot actions directly from human work videos with minimal effort are essential. To achieve this, an elemental motion system that provides a unified description across different embodiments of humans and robots is effective. This study proposes a training-free elemental motion recognition framework using VLMs and Therblig analysis, and introduces a transition-triggered local introspection that references adjacent frames only when error-prone Therblig transitions are detected. Single-frame VLM inference often misidentifies preparatory motions as grasping actions due to hand-object overlap. Although continuous multi-frame inference could improve accuracy, it is computationally expensive. Experiments on inspection tasks show that our proposed method improves recognition accuracy while keeping computational costs low.

Comment

To browse or post comments, you must log in.Log in