The 40th Annual Conference of the Japanese Society for Artificial Intelligence, 2026

Presentation Information

[1Yin-B-64]A Study on OCR Accuracy Improvement in LLMs through Visual Granularity and Context Augmentation

〇Tomoki takekawa¹, Takayuki Shimotomai¹, Yunya Hoshino¹, Koki Takeishi¹, Kohei Tanaka¹ (1. Headwaters Co., Ltd.)

Keywords:

Industrial Applications,Image Recognition,Medical Applications

In the digitization of Japanese documents, low-quality images captured by smartphones and complex layouts are major factors that degrade recognition accuracy.This study proposes two input strategies to improve the OCR performance of Large Language Models (LLM): layout-aware image splitting (Grid Split) and contextual supplementation (Hybrid approach) using external OCR.Experimental results demonstrate that increasing visual resolution through image splitting achieves the highest OCR accuracy without relying on external engines.

Back to Session information