Presentation Information
[A-14-06]Object-guided Visual Segmentation & Compression for Improving VLM on Streaming Video QA
〇Zhi Li1, Yanan Wang1, Hao Niu1, Julio Vizcarra1, Masato Taya1 (1. KDDI Research Inc.)
Keywords:
Multi-modal Large Language Model,Streaming Video Question Answering