Presentation Information
[1Yin-B-64]A Study on OCR Accuracy Improvement in LLMs through Visual Granularity and Context Augmentation
〇Tomoki takekawa1, Takayuki Shimotomai1, Yunya Hoshino1, Koki Takeishi1, Kohei Tanaka1 (1. Headwaters Co., Ltd.)
Keywords:
Industrial Applications,Image Recognition,Medical Applications
In the digitization of Japanese documents, low-quality images captured by smartphones and complex layouts are major factors that degrade recognition accuracy.This study proposes two input strategies to improve the OCR performance of Large Language Models (LLM): layout-aware image splitting (Grid Split) and contextual supplementation (Hybrid approach) using external OCR.Experimental results demonstrate that increasing visual resolution through image splitting achieves the highest OCR accuracy without relying on external engines.
