2025年度 人工知能学会全国大会(第39回)

2025年度 人工知能学会全国大会(第39回)

2025年5月27日〜5月30日大阪国際会議場+オンライン
人工知能学会
2025年度 人工知能学会全国大会(第39回)

2025年度 人工知能学会全国大会(第39回)

2025年5月27日〜5月30日大阪国際会議場+オンライン

[4K3-IS-2f-04]Learnability of Regular Languages in Language Models

〇Masaya Taniguchi1, Naoki Negishi2, Yusaku Nishimiya4,1, Keisuke Sakaguchi2, Kentaro Inui3,2,1(1. RIKEN, 2. Tohoku University, 3. Mohamed bin Zayed University of Artificial Intelligence, 4. University of Illinois Springfield)
This study explores the impact of the presentation order of positive and negative data on grammar acquisition in language models. We specifically focus on a text search problem, with the target grammar represented by a regular language. To conduct the study, we prepare two types of data: positive data, where sentences conforming to the target grammar are embedded within the text, and negative data, where such sentences are absent. Our findings demonstrate that both the sampling strategy for positive and negative data and the order in which these datasets are presented influence the language model's ability to acquire grammatical structures.