Presentation Information
[2E5-GS-10o-02]Design of a Multimodal LLM for Time-Series Data Interpretation and Driving Behavior Explanation
〇Shugo Matsusaka1, Koichi Seki1, Hideaki Bunazawa1, Takuya Shintate2, Shuheng You2, Yongpeng Cao2, Xi Xue2, Akira Yoshida2, Kunio Suzuki2 (1. Toyota Motor Corporation, 2. NABLAS Inc.)
Keywords:
Large Language Model,Multimodal AI,Time-Series Data,Caption Generation,LLM as a Judge
Recent advances in Large Language Models (LLMs) have expanded their multimodal capabilities to handle diverse data types such as images, videos, and audio. In the automotive industry, time-series data including vehicle speed, acceleration, and steering angle have been used for understanding driving situations and detecting anomalies, but traditional approaches require separate models for each task and have limitations in accuracy. To address these issues, this study proposes a Time-Series Language Model (TSLM), a novel multimodal LLM architecture that combines a time-series encoder with a projection layer to directly input extracted time-series features into an LLM. For a driving behavior explanation task, we constructed a dataset of time-series and text pairs from real driving data and trained the model. Using LLM as a judge, the proposed method outperformed the baseline Vision Language Models (VLMs).
Comment
To browse or post comments, you must log in.Log in
