Presentation Information

[5L3-OS-6b-03]Effectiveness and Application Conditions of Japanese Tuning Methods for SLMs (Small Language Models)

〇Qiang Zhong1, Toshiharu Ito1, Kenji Dohi1, Hirokazu Aoshima1 (1. Hitachi Hi-System21 Co.,Ltd.)

Keywords:

Small Language Model,Fine-tuning,Reinforcement Learning

When deploying generative AI on resource-constrained devices, Small Language Models (SLMs) are more suitable than LLMs, but may require tuning. In this study, we applied three tuning methods (SFT, GRPO, and SFT+GRPO) to two SLMs (Llama-3.2-3B and Qwen2.5-3B) and evaluated their effectiveness in improving Japanese language capabilities using the llm-jp-eval benchmark. Results: (1) SFT yielded comparable improvements for both models (+5.6% and +6.1%), (2) GRPO showed a significant difference between Llama (+3.6%) and Qwen (+0.6%), and (3) SFT+GRPO slightly outperformed SFT alone for Llama (+5.6%→+5.9%) but underperformed for Qwen (+6.1%→+5.3%). These results suggest an interaction between base model characteristics and tuning method effectiveness.