Presentation Information

[5Yin-A-27]Corporate Feature Extraction from Annual Securities Reports Using Clustering and LLM Evaluation

〇Yuichiro Okugawa1, Toshimitsu Tanaka1, Tomomi Nagao1, Akihiro Katsuno1, Hiroaki Kawata1 (1. NTT, Inc.)

Keywords:

Extraction of Corporate Characteristics,Securities Reports,Clustering

This study proposes a framework for extracting sentences that are both distinctive and important to individual companies from the “Business Overview” section of annual securities reports. Distinctiveness is quantified independently using kNN-based distance, while importance is assessed using a large language model (LLM). The two measures are integrated through a multiplicative model. Comparison with human evaluations shows that importance exhibits a statistically significant positive correlation, whereas distinctiveness demonstrates a negative correlation. As a result, simple multiplicative integration does not align with human judgment. These findings suggest that distinctiveness and importance possess fundamentally different structural properties.