Presentation Information

[3Yin-A-49]Corporate Document QA Benchmark Requiring Multi-Hop Reasoning

〇TAISEI OZAkI1,2, MIKI OTA1, YUTA KAMBAYASHI1, KOKI NAGATANI1 (1. MATSUO INSTITUTE, INC, 2. Osaka Metropolitan University)

Keywords:

Benchmark dataset,Tool use,Reinforcement Learning

Corporate emails and documents contain implicit structures such as organizational hierarchies, project relationships, and temporal contexts. Retrieval that leverages these structures is challenging for simple semantic search and requires iterative exploration by agents. To investigate the learnability of such implicit structures, we propose EnronHop, a benchmark based on the Enron email corpus. This benchmark consists of multi-hop questions that require reasoning based on sender-recipient relationships, email contexts, and organizational roles. We compare several proprietary and open-source models, including mail search reinforced one, and evaluate them in terms of accuracy, tool efficiency, and cost. Furthermore, we qualitatively analyze the tool invocation patterns of trained models to examine the extent to which implicit structure learning has been achieved.