U-Shaped Transformer with Frequency-Band Aware Attention for Speech Enhancement

Li, Y; Sun, Y; Wang, W; Naqvi, SM

doi:10.1109/TASLP.2023.3265839

U-Shaped Transformer with Frequency-Band Aware Attention for Speech Enhancement

Lookup NU author(s): Dr Yi Li, Dr Mohsen Naqvi

Downloads

Accepted version [.pdf]

Licence

This work is licensed under a Creative Commons Attribution 4.0 International License (CC BY 4.0).

Abstract

Recently, Transformer shows the potential to exploit the long-range sequence dependency in speech with self-attention. It has been introduced in single channel speech enhancement to improve the accuracy of speech estimation from a noise mixture. However, the amount of information represented across attention-heads is often huge, which leads to increased computational complexity. To address this issue, the axial attention is proposed i.e., to split a 2D attention into two 1-D attentions. In this paper, we develop a new method for speech enhancement by leveraging the axial attention, where we generate time and frequency sub-attention maps by calculating the attention map along time- and frequency-axis. Different from the conventional axial attention, the proposed method provides two parallel multi-head attentions for time- and frequency-axis, respectively. Moreover, the frequency-band aware attention is proposed i.e., high frequency-band attention (HFA), and low frequency-band attention (LFA), which facilitates the exploitation of the information related to speech and noise in different frequency bands in the noisy mixture. To re-use high-resolution feature maps from the encoder, we design a U-shaped Transformer, which helps recover lost information from the high-level representations to further improve the speech estimation accuracy. Extensive experiments on four public datasets are used to demonstrate the efficacy of the proposed method.

Publication metadata

Author(s): Li Y, Sun Y, Wang W, Naqvi SM

Publication type: Article

Publication status: Published

Journal: IEEE/ACM Transactions on Audio Speech and Language Processing

Year: 2023

Volume: 31

Pages: 1511-1521

Online publication date: 12/04/2023

Acceptance date: 03/04/2023

Date deposited: 05/04/2023

ISSN (print): 2329-9304

ISSN (electronic): 2329-9290

Publisher: IEEE/ACM

URL: https://doi.org/10.1109/TASLP.2023.3265839

DOI: 10.1109/TASLP.2023.3265839

ePrints DOI: 10.57711/e02q-0735

Altmetrics

Altmetrics provided by Altmetric

ePrints

U-Shaped Transformer with Frequency-Band Aware Attention for Speech Enhancement

Downloads

Licence

Abstract

Publication metadata

Altmetrics

Share