Presentation Information
[1H3-OS-40-05]MAPLE : Multi-Aspect Panels of LLM Evaluators for Open-Ended Questions
〇Michinori Jinji1, Kyohei Atarashi1, Koh Takeuchi1, Hisashi Kashima1 (1. Kyoto University)
Keywords:
LLM-as-a-Judge,Multi agent
Recent work on LLM-as-a-Judge has attracted growing attention as a way to automatically evaluate responses to open-ended questions using Large Language Models (LLMs). While this approach can substantially reduce the need of human assessment, which is costly in terms of time and money, the gap between human and LLM evaluations remains a key challenge. To improve the alignment of LLM evaluations with human evaluations, we propose Multi-Aspect Panels of LLM Evaluators (MAPLE), a framework that conducts evaluation using multiple criteria and multiple LLMs and then integrates the results. MAPLE aggregates criterion-wise comparative judgments from several LLM evaluators by estimating both the importance of each criterion and the reliability of each evaluator. In experiments on an essay scoring task, we compare MAPLE against multiple baselines and show that MAPLE consistently improves agreement with human evaluations and outperforms the baselines. These results indicate that integrating evaluations across multiple LLMs and multiple criteria is an effective strategy for LLM-as-a-Judge.
Comment
To browse or post comments, you must log in.Log in
