The 40th Annual Conference of the Japanese Society for Artificial Intelligence, 2026

Presentation Information

[4Yin-B-24]A Study on Acceleration for Vision-Language-Action Model Inference with ActCache

〇Ryuji Oi¹, Hikari Otsuka¹, Yuki Ichikawa¹, Tatsuya Kaneko¹, Masato Motomura¹, Daichi Fujiki¹ (1. Institute of Science Tokyo)

Keywords:

Deep Learning,Physical AI,Robotics Foundation Models

視覚言語モデルと拡散モデルベースのアクションヘッドを組み合わせた拡散ベースVision-Language-Actionモデルは，ロボット制御タスクで高い成功率を発揮している．
しかし，アクションヘッド内の反復的なデノイズ処理はレイテンシが大きく，推論のボトルネックとなっている．
そこで本研究では，キャッシュした過去のアクションを先行知識として再利用することで，アクション生成に必要な計算量を削減し，推論を高速化する手法を提案する．
シミュレーション環境での実験において，本手法は極めて少ないデノイズステップ数での生成でも高い成功率を維持し，ベースラインと比較して最大24.7%成功率が向上した．

Comment

To browse or post comments, you must log in.Log in

Back to Session information