Presentation Information
[5Yin-A-16]ShigyoBench: Construction and Performance Evaluation of an LLM Benchmark for Japanese Professional Licensing Examinations
〇Masato Todo1, Shin-nosuke Ishikawa1,2 (1. MAMEZO Co., Ltd., 2. Rikkyo University)
Keywords:
Benchmark,Japanese Dataset
Quantitative evaluation of practical knowledge in professional licensing exams for Large Language Models (LLMs) serves as a crucial indicator for their social implementation. In this study, we constructed "ShigyoBench," a benchmark consisting of 8,979 multiple-choice questions collected from eight major Japanese professional exams: Real Estate Brokerage, Administrative Scrivener, Patent Attorney, Judicial Scrivener, Preliminary Bar Exam, Bar Exam, Real Estate Appraiser, and Certified Public Accountant.By establishing a reproducible data construction pipeline and evaluating three different models, our findings revealed that Gemini-3-Pro exceeded the passing threshold in most exams. In contrast, GPT-5.1 and Qwen-3-235B performance remained within the 40–60% range. The dataset is publicly available at: https://huggingface.co/datasets/todo1111/shigyobench
