Pathological speech classification with a dual-branch residual network considering the fluency features of speech expression

Duan, S; Cheng, Y; Qin, Z; Zhu, T; Li, F; Liang, Y; Liang, H; Zhang, W

doi:10.1016/j.specom.2026.103417

Pathological speech classification with a dual-branch residual network considering the fluency features of speech expression

Lookup NU author(s): Ting Zhu

Downloads

Full text for this publication is not currently held within this repository. Alternative links are provided below where available.

Abstract

© 2026 Elsevier B.V. All rights are reserved, including those for text and data mining, AI training, and similar technologies.The speech of patients with dysarthria is often accompanied by symptoms such as involuntary pauses and reduced articulatory coherence, which are crucial for disease assessment. Existing pathological speech classification models rely on sets of features introduced from normal controls, neglecting the integration of disfluency-related features, thereby failing to capture the sufficient pathological information. Second, network architectures do not adequately address issues such as gradients vanishing or exploding during deep training, and they lack the capability to deeply explore channel and spatial dimension features, which in turn impacts classification accuracy. To overcome these challenges, this paper proposes a dual-branch residual pathological speech classification network with speech fluency feature compensation. Pause and coherence features are extracted based on THE-POSSD dataset and integrated with MFCC and formant features to construct a comprehensive feature set. For network architecture, wideband and narrowband spectrograms are used as dual inputs, and an adaptive feature extraction residual block with skip connections is employed to address gradient-related issues and extract deeper features. Additionally, the dual-branch features are fused using a complementary fusion module, which is weighted and optimized in conjunction with the multi-feature set to enhance recognition performance. Experimental results demonstrate that the proposed model achieves an accuracy of 96.21%, representing a 2.5% improvement over the baseline, while precision, recall, and F1 score are increased by 4.94%, 4.99%, and 5.07%, respectively. These findings validate the model's effectiveness and robustness, establishing it as a reliable tool for the clinical auxiliary diagnosis of speech disorders.

Publication metadata

Author(s): Duan S, Cheng Y, Qin Z, Zhu T, Li F, Liang Y, Liang H, Zhang W

Publication type: Article

Publication status: Published

Journal: Speech Communication

Year: 2026

Volume: 182

Print publication date: 01/07/2026

Online publication date: 20/05/2026

Acceptance date: 18/05/2026

ISSN (print): 0167-6393

ISSN (electronic): 1872-7182

Publisher: Elsevier B.V.

URL: https://doi.org/10.1016/j.specom.2026.103417

DOI: 10.1016/j.specom.2026.103417

Altmetrics

Altmetrics provided by Altmetric

ePrints

Pathological speech classification with a dual-branch residual network considering the fluency features of speech expression

Downloads

Abstract

Publication metadata

Altmetrics

Share