Browse by author
Lookup NU author(s): Dr Yi Li,
Dr Mohsen Naqvi
This is the authors' accepted manuscript of an article that has been accepted and is due to be published in its final definitive form by Institution of Engineering and Technology (IET) , 2021.
For re-use rights please refer to the publisher's terms and conditions.
The supervised single-channel speech enhancement presents one mixture recording at the input of neural network and updates network parameters in order to generate an output as the reconstructed speech signal. However, current neural networks based single-channel speech enhancement methods are not able to fully utilize pertinence with the specific frequency range of speech signals with limited computational complexity. In this paper, we study the power spectral density (PSD) of mixtures with human speech and noise interferences. Based on the theory that the speech signal distributes at the lower band, we propose a method to train signal approximation (SA) based neural networks with the lower frequency band of the speech mixture to improve the performance. To realize the lower band approach for single-channel speech enhancement, the method uses a long short-term memory (LSTM) block to exploit short-time Fourier transform (STFT) of the desired frequency range. Furthermore, in order to improve the speech enhancement performance within reverberant room environments, the dereverberation mask (DM) and the enhanced ratio mask (ERM) are exploited as the training targets of two LSTM blocks, respectively. The detailed evaluations confirm that the proposed method outperforms the state-of-the-art methods.
Author(s): Li Y, Sun Y, Naqvi SM
Publication type: Article
Publication status: In Press
Journal: IET Signal Processing
Acceptance date: 08/12/2020
Date deposited: 15/12/2020
ISSN (print): 1751-9675
ISSN (electronic): 1751-9683
Publisher: Institution of Engineering and Technology (IET)