Presentation Information
[5M3-GS-2h-04]Multimodal Music Emotion Recognition with Integrated Lyrics Semantic Representations and Functional Harmony, and Contribution Analysis
〇XIN YUAN1, Ayako Yamagiwa1, teng fei shao1, Masayuki Goto1 (1. Waseda University)
Keywords:
Music Emotion Recognition,Multimodal Fusion,BERT / RoBERTa,Functional Harmony
Previous studies on music emotion recognition have mainly focused on text-based methods, while the quantitative effects of musical structures such as harmony on emotion induction remain underexplored. Emotions such as loneliness strongly depend on musical context, making linguistic information alone insufficient and highlighting the necessity of multimodal analysis.We propose a multimodal music emotion recognition model that integrates lyrics semantics, harmonic progressions, and acoustic features. Lyrics semantics are extracted using BERT, functional harmony (Roman numeral notation) is encoded using RoBERTa, and statistical acoustic features such as pitch are derived from MIDI data. These modalities are fused through a fusion layer to model emotions across both abstract music-theoretical and concrete acoustic levels.An ablation study quantifies the contribution of each modality across 12 emotion categories, aiming to elucidate mechanisms of music-induced emotion. We further evaluate the complementary role of musical features in low-arousal emotions that are difficult to distinguish using linguistic information alone, providing insights for advanced music recommendation and creative support systems.
