The 40th Annual Conference of the Japanese Society for Artificial Intelligence, 2026

Presentation Information

3:45 PM - 4:00 PM JST(6:45 AM - 7:00 AM UTC)

[2E5-GS-10o-02]Design of a Multimodal LLM for Time-Series Data Interpretation and Driving Behavior Explanation

〇Shugo Matsusaka¹, Koichi Seki¹, Hideaki Bunazawa¹, Takuya Shintate², Shuheng You², Yongpeng Cao², Xi Xue², Akira Yoshida², Kunio Suzuki² (1. Toyota Motor Corporation, 2. NABLAS Inc.)

Keywords:

Large Language Model,Multimodal AI,Time-Series Data,Caption Generation,LLM as a Judge

Recent advances in Large Language Models (LLMs) have expanded their multimodal capabilities to handle diverse data types such as images, videos, and audio. In the automotive industry, time-series data including vehicle speed, acceleration, and steering angle have been used for understanding driving situations and detecting anomalies, but traditional approaches require separate models for each task and have limitations in accuracy. To address these issues, this study proposes a Time-Series Language Model (TSLM), a novel multimodal LLM architecture that combines a time-series encoder with a projection layer to directly input extracted time-series features into an LLM. For a driving behavior explanation task, we constructed a dataset of time-series and text pairs from real driving data and trained the model. Using LLM as a judge, the proposed method outperformed the baseline Vision Language Models (VLMs).

Back to Session information