Noise-resistant multimodal transformer for emotion recognition
Journal article
Liu, Yuanyuan, Zhang, Haoyu, Zhan, Yibing, Chen, Zijing, Yin, Guanghao, Wei, Lin and Chen, Zhe. (2025). Noise-resistant multimodal transformer for emotion recognition. International Journal of Computer Vision. pp. 3020-3040. https://doi.org/10.1007/s11263-024-02304-3
Authors | Liu, Yuanyuan, Zhang, Haoyu, Zhan, Yibing, Chen, Zijing, Yin, Guanghao, Wei, Lin and Chen, Zhe |
---|---|
Abstract | Multimodal emotion recognition identifies human emotions from various data modalities like video, text, and audio. However, we found that this task can be easily affected by noisy information that does not contain useful semantics and may occur at different locations of a multimodal input sequence. To this end, we present a novel paradigm that attempts to extract noise-resistant features in its pipeline and introduces a noise-aware learning scheme to effectively improve the robustness of multimodal emotion understanding against noisy information. Our new pipeline, namely Noise-Resistant Multimodal Transformer (NORM-TR), mainly introduces a Noise-Resistant Generic Feature (NRGF) extractor and a multimodal fusion Transformer for the multimodal emotion recognition task. In particular, we make the NRGF extractor learn to provide a generic and disturbance-insensitive representation so that consistent and meaningful semantics can be obtained. Furthermore, we apply a multimodal fusion Transformer to incorporate Multimodal Features (MFs) of multimodal inputs (serving as the key and value) based on their relations to the NRGF (serving as the query). Therefore, the possible insensitive but useful information of NRGF could be complemented by MFs that contain more details, achieving more accurate emotion understanding while maintaining robustness against noises. To train the NORM-TR properly, our proposed noise-aware learning scheme complements normal emotion recognition losses by enhancing the learning against noises. Our learning scheme explicitly adds noises to either all the modalities or a specific modality at random locations of a multimodal input sequence. We correspondingly introduce two adversarial losses to encourage the NRGF extractor to learn to extract the NRGFs invariant to the added noises, thus facilitating the NORM-TR to achieve more favorable multimodal emotion recognition performance. In practice, extensive experiments can demonstrate the effectiveness of the NORM-TR and the noise-aware learning scheme for dealing with both explicitly added noisy information and the normal multimodal sequence with implicit noises. On several popular multimodal datasets (e.g., MOSI, MOSEI, IEMOCAP, and RML), our NORM-TR achieves state-of-the-art performance and outperforms existing methods by a large margin, which demonstrates that the ability to resist noisy information in multimodal input is important for effective emotion recognition. |
Keywords | multimodal; emotion recognition; transformer; noise-resistant generic feature; noise-aware learning scheme |
Year | 2025 |
Journal | International Journal of Computer Vision |
Journal citation | pp. 3020-3040 |
Publisher | Springer |
ISSN | 0920-5691 |
Digital Object Identifier (DOI) | https://doi.org/10.1007/s11263-024-02304-3 |
Scopus EID | 2-s2.0-105003158610 |
Page range | 3020-3040 |
Funder | National Natural Science Foundation of China (NSFC) |
Natural Science Foundation of Hubei Province | |
Major Science and Technology Projects in Yunnan Province | |
Publisher's version | License All rights reserved File Access Level Controlled |
Output status | In press |
Publication process dates | |
Deposited | 16 Jun 2025 |
Grant ID | 62076227 |
62002090 | |
2023AFB57 | |
202202AD080007 | |
Additional information | © The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2024 |
https://acuresearchbank.acu.edu.au/item/91z31/noise-resistant-multimodal-transformer-for-emotion-recognition
Restricted files
Publisher's version
0
total views1
total downloads0
views this month1
downloads this month