Presentation Information
[2H4-OS-2a-04]LLM-as-a-Judge Evaluation and Analysis of Financial News Articles generated based on Factors of Stock Price Fluctuation
〇Yurina Kosai1, Rikuto Tsuchida1, Yucheng Xie1, Takehito utsuro1 (1. University of Tsukuba)
Keywords:
LLM-as-a-Judge,automatic evaluation,generating stock price fluctuation articles,factors of stock price fluctuation,large language models
This paper proposes a
LLM-as-a-Judge evaluation framework of
stock price fluctuation articles
automatically
generated
based on economic news,
corporate disclosures, and stock price fluctuation data.
This automatic article generation
framework emulates the workflow of human financial journalists
by analyzing recent stock movements and incorporating relevant causal factors extracted
from textual and numerical information.
In particular, the generation process utilizes news articles
and numerical stock price data, including price changes and fluctuation ranges over the past three days.
Based on those automatically generated stock price fluctuation articles,
this study places particular emphasis on LLM-as-a-Judge evaluation methodology.
We conduct structured human evaluation and
compare it with the LLM-as-a-Judge automatic metric.
We analyze the correlation among these evaluation methods to assess their reliability.
Furthermore, through comparisons between zero-shot and few-shot prompting,
we examine the effectiveness of the proposed framework and the validity of LLM-based
evaluation for assessing factual and causal consistency in financial text generation.
LLM-as-a-Judge evaluation framework of
stock price fluctuation articles
automatically
generated
based on economic news,
corporate disclosures, and stock price fluctuation data.
This automatic article generation
framework emulates the workflow of human financial journalists
by analyzing recent stock movements and incorporating relevant causal factors extracted
from textual and numerical information.
In particular, the generation process utilizes news articles
and numerical stock price data, including price changes and fluctuation ranges over the past three days.
Based on those automatically generated stock price fluctuation articles,
this study places particular emphasis on LLM-as-a-Judge evaluation methodology.
We conduct structured human evaluation and
compare it with the LLM-as-a-Judge automatic metric.
We analyze the correlation among these evaluation methods to assess their reliability.
Furthermore, through comparisons between zero-shot and few-shot prompting,
we examine the effectiveness of the proposed framework and the validity of LLM-based
evaluation for assessing factual and causal consistency in financial text generation.
