Presentation Information
[2F6-OS-19b-01]Entity Extraction for Occluded Texts with Multi Large Language Model
〇Hyakka Nakada1, Yoshiyasu Tanaka2 (1. Recruit Co., Ltd., 2. Yamadatanaka Co., Ltd.)
Keywords:
LLM,OCR,Multi modal,Prompt engineering
Optical Character Recognition (OCR) is a long-standing technology for extracting text from images, widely applied in tasks from invoice processing to historical document decipherment. Although deep learning has significantly improved OCR accuracy, simply outputting recognized text is often insufficient for practical applications. Compensation for missing entities is crucial for efficient information management. Recent advancements in generative models, Large Language Models (LLMs) and Multimodal LLMs (MLLMs), offer promising solutions for such an entity extraction. However, in general, OCR accuracy degrades significantly with real-world image degradation, such as rotation, distortion, and text truncation. While existing MLLM-based entity extraction studies have evaluated robustness against rotation and distortion with high character recognition accuracy, the impact of significant text truncation remains unknown. This research investigates the performance of MLLMs in entity extraction against text truncation. We systematically vary the degree of truncation in document images and analyze its impact on entity extraction accuracy. We confirmed that MLLMs, unlike traditional OCR tools, can leverage contextual information from surrounding text to effectively infer and complete truncated entities.
Comment
To browse or post comments, you must log in.Log in
