Presentation Information

[A-14-06]Object-guided Visual Segmentation & Compression for Improving VLM on Streaming Video QA

〇Zhi Li1, Yanan Wang1, Hao Niu1, Julio Vizcarra1, Masato Taya1 (1. KDDI Research Inc.)

Keywords:

Multi-modal Large Language Model,Streaming Video Question Answering