The 40th Annual Conference of the Japanese Society for Artificial Intelligence, 2026

Presentation Information

5:45 PM - 6:00 PM JST(8:45 AM - 9:00 AM UTC)

[2F6-OS-19b-02]Structuring Financial Unstructured Data Using Multimodal Generative AI Models

〇Katsuya Ito¹, Kei Nakagawa² (1. INDX Co., Ltd., 2. Osaka Metropolitan University)

Keywords:

Unstructured Data,Financial Engineering,Financial Statement Analysis

Extracting table information from Annual Securities Reports is a difficult task because the tables often have complex layouts, merged cells, and multi-level headers.
In this study, we propose a method that optimizes table structuring with a vision-language model (VLM) from two perspectives: text processing and image processing. For text processing, we go beyond simple Markdown conversion. We inject domain-specific knowledge into the prompt, restrict output tokens to reduce hallucinations, and automatically add relevant document context.
These steps improve the model’s interpretation accuracy.
For image processing, we systematically evaluate rendering parameters such as font size, grid lines, and DPI. We show that optimizing visual clarity has a dominant impact on performance.
Through a quantitative comparison of the two approaches, we find that image processing provides the largest performance gain, while text-based guidance also contributes in a complementary way.
On the UFO-2024 dataset, our method achieves 88.2% cell localization accuracy and 86.5% value extraction accuracy.

Comment

To browse or post comments, you must log in.Log in

Back to Session information