Presentation Information
[B-7-33]Chunked KV Cache Control for Efficient Vision-Language Model Inference
◎Kenshiro Wada1, Kenzo Okuda1, Hiroki Baba1, Naoki Kimishima1, Kentaro Hayashi1, Tomonori Takeda1 (1. NTT)
Keywords:
LLM,VLM,Transformer,KV Cache,Prefill,In Network Computing
