Presentation Information

[1Yin-A-42]Temporal Sequence Modeling for Boxing Action Recognition

〇Bojun AO1, Goshiro YAMAMOTO1, Sho MITARAI1, Chang LIU1, Kazumasa KISHIMOTO1, Hiroshi TAMURA1 (1. kyoto universe)

Keywords:

Human Pose Estimation

Boxing action recognition is challenging due to rapid motion, occlusion, and frequent arm overlap between fighters. Existing approaches often rely on single-frame predictions and do not track individual fighters, making it difficult to analyze a specific boxer’s actions over time. We outline a pipeline integrating SAM2 for fighter tracking, an explicit spatial normalization stage (crop–scale–center) for stable pose extraction, MediaPipe Pose for skeleton extraction, and a bidirectional LSTM-based sequence model for temporal action recognition. This paper summarizes the motivation, method design, dataset construction, and planned contributions toward a temporal sequence modeling framework for boxing action recognition.
Current experimental results demonstrate class-wise recognition accuracies of 55.2% for jab, 44.7% for hook, 33.1% for cross, 29.8% for uppercut, 30.4% for the 1–2 combination, and 14.9% for body shot. While these results indicate the effectiveness of temporal sequence modeling, performance is currently constrained by limited dataset size and class imbalance. Future work will focus on data expansion, improved feature representations, and more advanced model architectures to further enhance recognition performance.