The 40th Annual Conference of the Japanese Society for Artificial Intelligence, 2026

Presentation Information

[5Yin-A-16]ShigyoBench: Construction and Performance Evaluation of an LLM Benchmark for Japanese Professional Licensing Examinations

〇Masato Todo¹, Shin-nosuke Ishikawa^1,2 (1. MAMEZO Co., Ltd., 2. Rikkyo University)

Keywords:

Benchmark,Japanese Dataset

Quantitative evaluation of practical knowledge in professional licensing exams for Large Language Models (LLMs) serves as a crucial indicator for their social implementation. In this study, we constructed "ShigyoBench," a benchmark consisting of 8,979 multiple-choice questions collected from eight major Japanese professional exams: Real Estate Brokerage, Administrative Scrivener, Patent Attorney, Judicial Scrivener, Preliminary Bar Exam, Bar Exam, Real Estate Appraiser, and Certified Public Accountant.By establishing a reproducible data construction pipeline and evaluating three different models, our findings revealed that Gemini-3-Pro exceeded the passing threshold in most exams. In contrast, GPT-5.1 and Qwen-3-235B performance remained within the 40–60% range. The dataset is publicly available at: https://huggingface.co/datasets/todo1111/shigyobench

Back to Session information