The 40th Annual Conference of the Japanese Society for Artificial Intelligence, 2026

Presentation Information

2:40 PM - 2:55 PM JST(5:40 AM - 5:55 AM UTC)

[1H3-OS-40-05]MAPLE : Multi-Aspect Panels of LLM Evaluators for Open-Ended Questions

〇Michinori Jinji¹, Kyohei Atarashi¹, Koh Takeuchi¹, Hisashi Kashima¹ (1. Kyoto University)

Keywords:

LLM-as-a-Judge,Multi agent

Recent work on LLM-as-a-Judge has attracted growing attention as a way to automatically evaluate responses to open-ended questions using Large Language Models (LLMs). While this approach can substantially reduce the need of human assessment, which is costly in terms of time and money, the gap between human and LLM evaluations remains a key challenge. To improve the alignment of LLM evaluations with human evaluations, we propose Multi-Aspect Panels of LLM Evaluators (MAPLE), a framework that conducts evaluation using multiple criteria and multiple LLMs and then integrates the results. MAPLE aggregates criterion-wise comparative judgments from several LLM evaluators by estimating both the importance of each criterion and the reliability of each evaluator. In experiments on an essay scoring task, we compare MAPLE against multiple baselines and show that MAPLE consistently improves agreement with human evaluations and outperforms the baselines. These results indicate that integrating evaluations across multiple LLMs and multiple criteria is an effective strategy for LLM-as-a-Judge.

Back to Session information