The 40th Annual Conference of the Japanese Society for Artificial Intelligence, 2026

Presentation Information

[1Yin-A-42]Temporal Sequence Modeling for Boxing Action Recognition

〇Bojun AO¹, Goshiro YAMAMOTO¹, Sho MITARAI¹, Chang LIU¹, Kazumasa KISHIMOTO¹, Hiroshi TAMURA¹ (1. kyoto universe)

Keywords:

Human Pose Estimation

Boxing action recognition is challenging due to rapid motion, occlusion, and frequent arm overlap between fighters. Existing approaches often rely on single-frame predictions and do not track individual fighters, making it difficult to analyze a specific boxer’s actions over time. We outline a pipeline integrating SAM2 for fighter tracking, an explicit spatial normalization stage (crop–scale–center) for stable pose extraction, MediaPipe Pose for skeleton extraction, and a bidirectional LSTM-based sequence model for temporal action recognition. This paper summarizes the motivation, method design, dataset construction, and planned contributions toward a temporal sequence modeling framework for boxing action recognition.
Current experimental results demonstrate class-wise recognition accuracies of 55.2% for jab, 44.7% for hook, 33.1% for cross, 29.8% for uppercut, 30.4% for the 1–2 combination, and 14.9% for body shot. While these results indicate the effectiveness of temporal sequence modeling, performance is currently constrained by limited dataset size and class imbalance. Future work will focus on data expansion, improved feature representations, and more advanced model architectures to further enhance recognition performance.

Comment

To browse or post comments, you must log in.Log in

Back to Session information