The 40th Annual Conference of the Japanese Society for Artificial Intelligence, 2026

Presentation Information

[4Yin-A-57]Zero-Shot Triggering of Homograph Reading with Brackets using Pre-trained T5 Language Models

〇Tomoya Hirata¹, Kazuhiro Takeuchi¹ (1. Osaka Electro-Communication University)

Keywords:

Homograph,Reading Disambiguation,T5,Zero-Shot

Japanese homographs share the same orthography but take different readings depending on context. Most prior work formulates reading disambiguation as classification with word-specific label sets. This paper reformulates the task as constrained text generation and applies a pre-trained Japanese T5 model in a zero-shot setting. We mask only the reading span in bracket-style furigana annotations and decode the reading in hiragana with token-level vocabulary constraints. Evaluation on 33,877 Tatoeba examples over 248 homographs shows that zero-shot T5 reaches 0.863 micro accuracy, while GiNZA reaches 0.915. The analysis reveals clear word-level complementarity, where T5 improves context-driven cases but remains weak on lexicalized readings.

Back to Session information