Presentation Information
[4Yin-B-31]Construction of a Japanese Problem-Solution Dataset for Reinforcement Learning and Improved Reasoning in Large Language ModelsImproved Performance on Mathematics, Science, and Code Generation Tasks
〇Susumu Ota1, Yuta Katayama1, Sakae Mizuki1,2, Naoaki Okazaki1,2,3 (1. Institute of Science Tokyo, 2. National Institute of Advanced Industrial Science and Technology, 3. National Institute of Informatics, Research and Development Center for Large Language Models)
Keywords:
Reinforcement Learning,Large Language Model,Reinforcement Learning with Verifiable Rewards,Reasoning,Dataset Construction
We construct a Japanese problem-solution dataset to improve the reasoning ability of Japanese large language models (LLMs) on mathematics, science, and code generation tasks. The dataset is derived from an English instruction-following dataset by translating problem statements into Japanese, providing solutions, and annotating answerability, resulting in Japanese problem-solution pairs designed for Reinforcement Learning with Verifiable Rewards (RLVR). Using both the English and Japanese versions of the dataset, we train models with reinforcement learning and comparatively evaluate performance on comprehensive benchmarks, including high-difficulty tasks. Experimental results show that models trained on the Japanese problems achieve consistent improvements on Japanese mathematics, science, and code generation benchmarks. These findings demonstrate that language-specific datasets combined with RLVR effectively enhance the reasoning capabilities of Japanese LLMs.
