The 40th Annual Conference of the Japanese Society for Artificial Intelligence, 2026

Presentation Information

[4Yin-B-31]Construction of a Japanese Problem-Solution Dataset for Reinforcement Learning and Improved Reasoning in Large Language ModelsImproved Performance on Mathematics, Science, and Code Generation Tasks

〇Susumu Ota¹, Yuta Katayama¹, Sakae Mizuki^1,2, Naoaki Okazaki^1,2,3 (1. Institute of Science Tokyo, 2. National Institute of Advanced Industrial Science and Technology, 3. National Institute of Informatics, Research and Development Center for Large Language Models)

Keywords:

Reinforcement Learning,Large Language Model,Reinforcement Learning with Verifiable Rewards,Reasoning,Dataset Construction

We construct a Japanese problem-solution dataset to improve the reasoning ability of Japanese large language models (LLMs) on mathematics, science, and code generation tasks. The dataset is derived from an English instruction-following dataset by translating problem statements into Japanese, providing solutions, and annotating answerability, resulting in Japanese problem-solution pairs designed for Reinforcement Learning with Verifiable Rewards (RLVR). Using both the English and Japanese versions of the dataset, we train models with reinforcement learning and comparatively evaluate performance on comprehensive benchmarks, including high-difficulty tasks. Experimental results show that models trained on the Japanese problems achieve consistent improvements on Japanese mathematics, science, and code generation benchmarks. These findings demonstrate that language-specific datasets combined with RLVR effectively enhance the reasoning capabilities of Japanese LLMs.

Comment

To browse or post comments, you must log in.Log in

Back to Session information