Presentation Information

[5M1-GS-2b-01]AlphaZeRS: Efficient Decision-Making with Limited Computational Resources

Takumi Watanabe1, 〇Suguru Takauchi1, Yu Kamata2, Ryoji Sakuraoka2, Yu Kohno1, Tatsuji Takahashi1 (1. School of Science and Engineering, Tokyo Denki University, 2. Graduate School of Tokyo Denki University)

Keywords:

Decision-Making Task,Machine Learning,Neural Network

AlphaZero, introduced by DeepMind in 2017, achieved superhuman performance in board games such as Go and Shogi through self-play learning that combines Monte Carlo tree search (MCTS) with deep neural networks. Since then, however, AlphaZero’s expected-return maximization objective has typically required a large number of MCTS simulations during both training and inference, which limits deployment under tight computational budgets and low-latency requirements. Risk-Sensitive Satisficing (RS) provides an alternative decision rule that models satisficing under bounded rationality: it sets an aspiration level and stops searching once it finds an action that exceeds that level. We propose AlphaZeRS, our method that incorporates RS into AlphaZero, to reduce redundant searches while maintaining competitive move quality. Using the game of Reversi (Othello), which has a vast and complex state space, we compare AlphaZeRS with standard AlphaZero under varying node-search budgets. Our experiments, including statistical superiority tests, show that AlphaZeRS attains a higher win rate than standard AlphaZero with fewer node searches.