The 40th Annual Conference of the Japanese Society for Artificial Intelligence, 2026

Presentation Information

[4Yin-A-43]Evaluating the Effectiveness of Japanese Pretraining and Fine-Tuning for Language Models with Neural Long-Term Memory Modules

〇Yuki Shimoda¹, Yoshinobu Kano¹ (1. Shizuoka University)

Keywords:

Pretraining,Fine-Tuning,Multi-Turn

Transformers face challenges in handling very long contexts. This study examines Titans—an architecture that may address these limitations—and evaluates its applicability to Japanese via pretraining and SFT on Japanese datasets. We analyze its behavior and tendencies in a Japanese setting. During pretraining, perplexity decreased as segment length (attention window) increased. Japanese MT-Bench evaluation showed that longer segment lengths generally yield more stable outputs in multi-turn dialogue. Titans also outperformed the baseline on some categories and questions, and turn-level analysis indicated a stronger tendency than the baseline of comparable number of parameters for higher relative second-turn scores than first-turn scores.

Comment

To browse or post comments, you must log in.Log in

Back to Session information