The 40th Annual Conference of the Japanese Society for Artificial Intelligence, 2026

Presentation Information

[1Yin-A-23]speech emotion recognition using vote rate distributions fine tuning and ambiguity aware learningComparative Evaluation Using CREMA-D and wav2vec2 Emotion Expression

〇Takuma Endo¹, Shuji Shinohara¹, Takeshi Takano², Nobuhito Manome³, Masakazu Higuchi⁴ (1. Tokyo Denki University, 2. University of Texas San Antonio, 3. Center for Disease Biology and Integrative Medicine, Graduate School of Medicine, the University of Tokyo, 4. Graduate School of Health Innovation, Kanagawa University of Human Services)

Keywords:

Speech Emotion Recognition,Ambiguity,Objective label

In speech emotion recognition, objective labels (majority votes) and subjective labels (speaker intent) often disagree.
this study investigates a two stage learning approach on cremad, where an objective model trained on six dimensional vote rates is subsequently finetuned using subjective labels.
the proposed method achieves 0.797 accuracy compared to 0.760 with subjective only training, and ambiguity based analysis reveals the factors contributing to this improvement.

Back to Session information