Browse by author
Lookup NU author(s): Rui Sun
This work is licensed under a Creative Commons Attribution 4.0 International License (CC BY 4.0).
© 2023 by the authors. Researchers have recently focused on multimodal emotion recognition, but issues persist in recognizing emotions in multi-party dialogue scenarios. Most studies have only used text and audio modality, ignoring the video modality. To address this, we propose M2ER, a multimodal emotion recognition scheme based on multi-party dialogue scenarios. Addressing the issue of multiple faces appearing in the same frame of the video modality, M2ER introduces a method using multi-face localization for speaker recognition to eliminate the interference of non-speakers. The attention mechanism is used to fuse and classify different modalities. We conducted extensive experiments in unimodal and multimodal fusion using the multi-party dialogue dataset MELD. The results show that M2ER achieves superior emotion recognition in both text and audio modalities compared to the baseline model. The proposed method using speaker recognition in the video modality improves emotion recognition performance by 6.58% compared to the method without speaker recognition. In addition, the multimodal fusion based on the attention mechanism also outperforms the baseline fusion model.
Author(s): Zhang B, Yang X, Wang G, Wang Y, Sun R
Publication type: Article
Publication status: Published
Journal: Applied Sciences
Year: 2023
Volume: 13
Issue: 20
Online publication date: 16/10/2023
Acceptance date: 11/10/2023
Date deposited: 20/05/2024
ISSN (electronic): 2076-3417
Publisher: MDPI
URL: https://doi.org/10.3390/app132011340
DOI: 10.3390/app132011340
Data Access Statement: Not applicable.
Altmetrics provided by Altmetric