Presentation Information

[5M1-GS-2b-02]Sample-Efficient Reinforcement Learning through Cross Bisimulation-Based Implicit Imitation Learning

〇Takahisa Imagawa1, Shuichi Enokida1 (1. Kyushu Institute of Technology)

Keywords:

Reinforcement Learning,Imitation Learning

Reinforcement learning is a useful methodology with a wide range of application. examples; however, it generally requires a large amount of data, and reducing this requirement remains an important challenge.In this study, we propose Cross Bisimulation-based Implicit Imitation Learning (CBI2L), a method that improves the sample efficiency of reinforcement learning by leveraging a small amount of imitation data that is not necessarily high quality.Specifically, for the Markov decision processes of two agents—namely, a mentor to be imitated and an observer that performs reinforcement learning—we define a pseudometric called the cross bisimulation metric, which is based on the difference in cumulative rewards between them.We then theoretically analyze (i) the unique existence of the cross bisimulation metric via the fixed-point theorem and (ii) the relationship between the difference in the expected cumulative rewards of the mentor and observer and the cross bisimulation metric, thereby establishing the validity of CBI2L, which uses the mentor’s rewards for the observer’s learning.Furthermore, we incorporate CBI2L into Soft Actor-Critic (SAC) and empirically show that it improves learning efficiency compared to SAC in the PointMaze environment.