Presentation Information

[4Yin-A-43]Evaluating the Effectiveness of Japanese Pretraining and Fine-Tuning for Language Models with Neural Long-Term Memory Modules

〇Yuki Shimoda1, Yoshinobu Kano1 (1. Shizuoka University)

Keywords:

Pretraining,Fine-Tuning,Multi-Turn

Transformers face challenges in handling very long contexts. This study examines Titans—an architecture that may address these limitations—and evaluates its applicability to Japanese via pretraining and SFT on Japanese datasets. We analyze its behavior and tendencies in a Japanese setting. During pretraining, perplexity decreased as segment length (attention window) increased. Japanese MT-Bench evaluation showed that longer segment lengths generally yield more stable outputs in multi-turn dialogue. Titans also outperformed the baseline on some categories and questions, and turn-level analysis indicated a stronger tendency than the baseline of comparable number of parameters for higher relative second-turn scores than first-turn scores.