The 40th Annual Conference of the Japanese Society for Artificial Intelligence, 2026

Presentation Information

2:45 PM - 3:00 PM JST(5:45 AM - 6:00 AM UTC)

[5M3-GS-2h-04]Multimodal Music Emotion Recognition with Integrated Lyrics Semantic Representations and Functional Harmony, and Contribution Analysis

〇XIN YUAN¹, Ayako Yamagiwa¹, teng fei shao¹, Masayuki Goto¹ (1. Waseda University)

Keywords:

Music Emotion Recognition,Multimodal Fusion,BERT / RoBERTa,Functional Harmony

Previous studies on music emotion recognition have mainly focused on text-based methods, while the quantitative effects of musical structures such as harmony on emotion induction remain underexplored. Emotions such as loneliness strongly depend on musical context, making linguistic information alone insufficient and highlighting the necessity of multimodal analysis.We propose a multimodal music emotion recognition model that integrates lyrics semantics, harmonic progressions, and acoustic features. Lyrics semantics are extracted using BERT, functional harmony (Roman numeral notation) is encoded using RoBERTa, and statistical acoustic features such as pitch are derived from MIDI data. These modalities are fused through a fusion layer to model emotions across both abstract music-theoretical and concrete acoustic levels.An ablation study quantifies the contribution of each modality across 12 emotion categories, aiming to elucidate mechanisms of music-induced emotion. We further evaluate the complementary role of musical features in low-arousal emotions that are difficult to distinguish using linguistic information alone, providing insights for advanced music recommendation and creative support systems.

Comment

To browse or post comments, you must log in.Log in

Back to Session information