Presentation Information
[5I1-OS-3-03]Expert Information Reconstruction-based Policy Learning for Energy Management Strategy in Hybrid Electric Vehicles
〇Yuepeng Wang1, Xun Shen1 (1. Tokyo University of Agriculture and Technology)
Keywords:
AI,Learning based control
Conventional energy management strategies based on deep reinforcement learning (DRL) often suffer from slow convergence and lack of generalization.
This paper proposes an \emph{Expert Information Reconstruction-based Policy Learning} (EIRPL) framework to address these issues.
Specifically, EIRPL first reconstructs an expert policy from driving data via dynamic programming to train a policy that optimizes energy efficiency, ensuring global optimality for each driving scenario.
Then, an implicit action-value function that captures expert knowledge is recovered from the trained optimal policy through adversarial inverse reinforcement learning.
Finally, the Soft Actor–Critic (SAC) algorithm is employed to optimize the policy corresponding to the learned value function.
Notably, the proposed EIRPL separates the training of the value function and policy into two distinct stages, which accelerates convergence and enhances robustness.
Finally, simulation results demonstrate a significant improvement in convergence speed, a higher cumulative reward, and a noticeable reduction in fuel consumption across multiple driving cycles.
This paper proposes an \emph{Expert Information Reconstruction-based Policy Learning} (EIRPL) framework to address these issues.
Specifically, EIRPL first reconstructs an expert policy from driving data via dynamic programming to train a policy that optimizes energy efficiency, ensuring global optimality for each driving scenario.
Then, an implicit action-value function that captures expert knowledge is recovered from the trained optimal policy through adversarial inverse reinforcement learning.
Finally, the Soft Actor–Critic (SAC) algorithm is employed to optimize the policy corresponding to the learned value function.
Notably, the proposed EIRPL separates the training of the value function and policy into two distinct stages, which accelerates convergence and enhances robustness.
Finally, simulation results demonstrate a significant improvement in convergence speed, a higher cumulative reward, and a noticeable reduction in fuel consumption across multiple driving cycles.
