Browse by author
Lookup NU author(s): Dr Huizhi LiangORCiD
This work is licensed under a Creative Commons Attribution 4.0 International License (CC BY 4.0).
© 2026 by the authors.Dysarthria in patients post-stroke is often accompanied by central facial paralysis, which impairs facial motor control and emotional expression. Current assessments rely on acoustic modalities, overlooking facial pathological cues and their correlation with emotional expression, which hinders comprehensive disease assessment. To address this issue, we propose a multimodal severity classification framework that integrates facial and acoustic features. Firstly, a multi-level annotation algorithm based on a pre-trained model and motion amplitude was designed to overcome the problem of data scarcity. Secondly, facial topology was modeled using Delaunay triangulation, with spatial relationships captured via graph convolutional networks (GCNs), while abnormal muscle coordination is quantified using facial action units (AUs). Finally, we proposed a multimodal feature set fusion technology framework to achieve the compensation of facial visual features for acoustic modalities and the analysis of disease classification. Our experimental results using the THE-POSSD dataset demonstrate an accuracy of 92.0% and an F1 score of 91.6%, significantly outperforming single-modality baselines. This study reveals the changes in facial movements and sensitive areas of patients under different emotional states, verifies the compensatory ability of visual patterns for auditory patterns, and demonstrates the potential of this multimodal framework for objective assessment and future clinical applications in speech disorders.
Author(s): Duan S, Guo Y, Fu L, Li F, Dong X, Liang H, Zhang W
Publication type: Article
Publication status: Published
Journal: Sensors
Year: 2026
Volume: 26
Issue: 4
Online publication date: 12/02/2026
Acceptance date: 11/02/2026
Date deposited: 09/03/2026
ISSN (electronic): 1424-8220
Publisher: MDPI
URL: https://doi.org/10.3390/s26041239
DOI: 10.3390/s26041239
Data Access Statement: The data presented in this study are available on request from the corresponding author. The data are not publicly available due to privacy and ethical restrictions, as they contain sensitive multimodal information (facial videos and acoustic recordings) that could compromise the privacy of the participants.
Altmetrics provided by Altmetric