Presentation Information
[1Yin-A-23]speech emotion recognition using vote rate distributions fine tuning and ambiguity aware learningComparative Evaluation Using CREMA-D and wav2vec2 Emotion Expression
〇Takuma Endo1, Shuji Shinohara1, Takeshi Takano2, Nobuhito Manome3, Masakazu Higuchi4 (1. Tokyo Denki University, 2. University of Texas San Antonio, 3. Center for Disease Biology and Integrative Medicine, Graduate School of Medicine, the University of Tokyo, 4. Graduate School of Health Innovation, Kanagawa University of Human Services)
Keywords:
Speech Emotion Recognition,Ambiguity,Objective label
In speech emotion recognition, objective labels (majority votes) and subjective labels (speaker intent) often disagree.
this study investigates a two stage learning approach on cremad, where an objective model trained on six dimensional vote rates is subsequently finetuned using subjective labels.
the proposed method achieves 0.797 accuracy compared to 0.760 with subjective only training, and ambiguity based analysis reveals the factors contributing to this improvement.
this study investigates a two stage learning approach on cremad, where an objective model trained on six dimensional vote rates is subsequently finetuned using subjective labels.
the proposed method achieves 0.797 accuracy compared to 0.760 with subjective only training, and ambiguity based analysis reveals the factors contributing to this improvement.
