講演情報

[3H1-OS-9a-04]Learning Interpretable Koopman Representations from Video

〇Henrik Krauss1, Naoya Takeishi1, Takehisa Yairi1 (1. The University of Tokyo)

キーワード:

Koopman Operator Learning、Representation Learning、Dynamical Systems、Interpretable Machine Learning

Learning dynamical models from video (i.e., image sequences) is challenging due to high-dimensional visual inputs and limited interpretability of learned representations. Koopman-based methods embed nonlinear dynamics into a lifted, latent space with approximately linear evolution, but the semantic meaning of latent variables is often unclear. In this work, we investigate interpretable Koopman dynamics learning from video using a custom decoder that links latent variables to pixel-accurate attention maps on the decoded image. Through analyses on synthetic video datasets with independently moving objects, we investigate our model's ability to learn object-aligned latent representations in comparison to plain Koopman models. We further analyze the structure of the learned Koopman dynamics induced by the attention-based decoder. Additionally, attention maps enable intuitive visualization of each latent variable’s contribution to image reconstruction.