Toggle Main Menu Toggle Search

Open Access padlockePrints

A Multi-Scale Feature Refinement and Dual-Attention Enhanced Dynamic Convolutional Network for Speech-Based Depression and ADHD Assessment

Lookup NU author(s): Dr Shuanglin Li, Dr Mohsen Naqvi

Downloads


Licence

This work is licensed under a Creative Commons Attribution 4.0 International License (CC BY 4.0).


Abstract

In the area of affective computing, speech has been identified as a promising biomarker for assessing depression and attention deficit hyperactivity disorder (ADHD). These disorders manifest as abnormalities in speech across various frequency bands and exhibit temporal variations. Most existing work on speech features relies on the magnitude spectrogram, which discards phase information and also does not consider the impact of different frequency bands on depression and ADHD detection. Inspired by these, we propose a novel multi-scale complex feature refinement and dynamic convolution attention-aware network to enhance speech-based assessment of depression and ADHD. Our approach incorporates three key components: multi-scale complex feature refinement (MSFR), dynamic convolutional neural network (Dy-CNN), and dual-attention feature enhancement (DAFE) module. The MSFR module utilizes depth-wise convolutional networks to process both magnitude and phase input, selectively emphasizing frequency bands associated with depression and ADHD. Importantly, the Dy-CNN module employs an attention mechanism to autonomously generate multiple convolution kernels that adapt to input features and capture relevant temporal dynamics linked to depression and ADHD. Additionally, the DAFE module enhances feature representation and detection performance by incorporating channel shuffle attention (CSA) and spatial axial attention (SAA) mechanisms, which leverage both inter- and intra-channel relationships and examine time-frequency characteristics of the feature map. Extensive experiments conducted on four publicly available datasets, i.e., AVEC2013, AVEC2014, E-DAIC, and a self-collected authentic ADHD dataset demonstrated that the proposed method outperforms previous approaches and exhibits superior generalization capabilities across different language settings (i.e., English, German) for speech-based depression and ADHD assessment.


Publication metadata

Author(s): Li S, Song S, Naqvi SM

Publication type: Article

Publication status: Published

Journal: IEEE Transactions on Affective Computing

Year: 2025

Pages: epub ahead of print

Online publication date: 02/09/2025

Acceptance date: 01/09/2025

Date deposited: 08/09/2025

ISSN (print): 1949-3045

Publisher: IEEE

URL: https://doi.org/10.1109/TAFFC.2025.3604562

DOI: 10.1109/TAFFC.2025.3604562

ePrints DOI: 10.57711/8sb9-vm60


Altmetrics

Altmetrics provided by Altmetric


Share