Presentation Information
[1Yin-B-08]A Multimodal Emotion Recognition System Applicable to Both Individuals with Depressive Tendencies and Healthy Controls
〇Kaname Nakayama1, Kazuyuki Matumoto2, Minoru Yoshida2, Xin Kang2, Takaki Fukumori2, Munenaga Koda2, Keita Kiuchi3, Hidehiro Umehara4, Tomohiko Nakayama5, Masahito Nakataki5 (1. Graduate School of Sciences and Technology for Innovation, Tokushima University, 2. Graduate School of Technology, Industrial and Social Sciences, Tokushima University, 3. National Institute of Occupational Safety and Health, Japan Organization of Occupational Health and Safety, 4. Accessibility Support Department, 5. Department of Psychiatry, Graduate School of Biomedical Science, Tokushima University, Tokushima, Japan)
Keywords:
Emotion Estimation,Mental Health,Multimodal,Feature Fusion,Deep Learning
This study proposes a multimodal emotion estimation system applicable to both depressed and healthy individuals. Conventional models often fail on depressed subjects as their blunted affect is treated as noise. We address this by focusing on the "emotional inconsistency" between verbal context (topic) and physical expressions. Our method integrates audio, text, and video features with topic and depression indicators, using a mixed training dataset. This approach establishes a healthy "standard" to detect depressive "distortions" as significant features. Experiments demonstrate robust estimation across mental states, identifying context and vocal prosody as critical indicators.
