The 40th Annual Conference of the Japanese Society for Artificial Intelligence, 2026

Presentation Information

2:30 PM - 2:45 PM JST(5:30 AM - 5:45 AM UTC)

[5L3-OS-6b-03]Effectiveness and Application Conditions of Japanese Tuning Methods for SLMs (Small Language Models)

〇Qiang Zhong¹, Toshiharu Ito¹, Kenji Dohi¹, Hirokazu Aoshima¹ (1. Hitachi Hi-System21 Co.,Ltd.)

Keywords:

Small Language Model,Fine-tuning,Reinforcement Learning

When deploying generative AI on resource-constrained devices, Small Language Models (SLMs) are more suitable than LLMs, but may require tuning. In this study, we applied three tuning methods (SFT, GRPO, and SFT+GRPO) to two SLMs (Llama-3.2-3B and Qwen2.5-3B) and evaluated their effectiveness in improving Japanese language capabilities using the llm-jp-eval benchmark. Results: (1) SFT yielded comparable improvements for both models (+5.6% and +6.1%), (2) GRPO showed a significant difference between Llama (+3.6%) and Qwen (+0.6%), and (3) SFT+GRPO slightly outperformed SFT alone for Llama (+5.6%→+5.9%) but underperformed for Qwen (+6.1%→+5.3%). These results suggest an interaction between base model characteristics and tuning method effectiveness.

Comment

To browse or post comments, you must log in.Log in

Back to Session information