Presentation Information

[5L3-OS-6b-01]Document Sampling Strategies for LLM-based Topic Labeling and Their Quality Evaluation

〇Hiromasa Sato1, Kazushi Okamoto1, Koki Karube1, Kei Harada1, Atsushi Shibata1 (1. The University of Electro-Communications)

Keywords:

Topic Labeling,Topic Model,Large Language Models,Prompt Engineering

Topic word lists generated by topic models are often difficult for humans to interpret. Furthermore, existing automated labeling methods using large language models (LLMs) rely solely on topic words, leading to a lack of contextual information. In this study, we propose a method to improve label quality by incorporating representative documents based on topic membership scores in addition to topic words. We evaluate the effectiveness of four document sampling strategies against a baseline that uses only topic words. The evaluation was conducted using expert consistency questionnaires and cosine similarity between topic labels and documents within topics. The results showed that the strategy using documents with moderate membership scores not only outperformed the baseline (5.53) in questionnaire evaluation (average 5.59) but also achieved the best performance in similarity evaluation, ranking 1.6 places higher on average. These findings suggest that introducing appropriate sampling strategies contributes to improving topic label quality.