Presentation Information

[1Yin-B-15]Improving the Zero-Shot Classification Performance of the Skeleton-Based Action Classification Model GAP by Diversifying and Elaborating Training Texts

〇Haruki Yamada1, Kentaro Kasama1, Madoka Inoue2, Ryo Taguchi1 (1. Nagoya Institute of Technology, 2. AIPHONE CO., LTD.)

Keywords:

Skeleton-based Action Recognition,Zero-Shot Classification,Multimodal

The skeleton-based action classification model GAP proposed by Xiang et al. improves classification accuracy through contrastive learning between text and actions; however, its zero-shot classification capability has not been discussed. Our experiments reveal that the training texts used in the prior work make it difficult to generalize to unseen data. Although the previous study generates part-wise motion descriptions using an LLM, conventional prompts tend to produce texts with limited diversity and insufficient expressiveness. Therefore, in this study, we aim to enhance the quality of action representations by prompting an LLM to generate descriptions based on three aspects: hierarchical skeletal structure, relative positions, and Laban theory. In addition, to mitigate overfitting of the text encoder, we set a shorter training period for the text encoder than for the action encoder. We use NTU RGB+D 60 as the dataset for experiments. The dataset contains 60 action classes; in our setting, 59 classes are used for training and the remaining one class is used for evaluation. Experimental results demonstrate that the proposed method improves zero-shot classification performance.