The 40th Annual Conference of the Japanese Society for Artificial Intelligence, 2026

Presentation Information

2:00 PM - 2:15 PM JST(5:00 AM - 5:15 AM UTC)

[5L3-OS-6b-01]Document Sampling Strategies for LLM-based Topic Labeling and Their Quality Evaluation

〇Hiromasa Sato¹, Kazushi Okamoto¹, Koki Karube¹, Kei Harada¹, Atsushi Shibata¹ (1. The University of Electro-Communications)

Keywords:

Topic Labeling,Topic Model,Large Language Models,Prompt Engineering

Topic word lists generated by topic models are often difficult for humans to interpret. Furthermore, existing automated labeling methods using large language models (LLMs) rely solely on topic words, leading to a lack of contextual information. In this study, we propose a method to improve label quality by incorporating representative documents based on topic membership scores in addition to topic words. We evaluate the effectiveness of four document sampling strategies against a baseline that uses only topic words. The evaluation was conducted using expert consistency questionnaires and cosine similarity between topic labels and documents within topics. The results showed that the strategy using documents with moderate membership scores not only outperformed the baseline (5.53) in questionnaire evaluation (average 5.59) but also achieved the best performance in similarity evaluation, ranking 1.6 places higher on average. These findings suggest that introducing appropriate sampling strategies contributes to improving topic label quality.

Comment

To browse or post comments, you must log in.Log in

Back to Session information