Browse by author
Lookup NU author(s): Dr Xing Kek, Professor Cheng Chin
Full text for this publication is not currently held within this repository. Alternative links are provided below where available.
The current methodology in tackling Acoustic Scene Classification (ASC) task can be described in two steps, preprocessing of the audio waveform into log-mel spectrogram and then using it as the input representation for Convolutional Neural Network (CNN). This paradigm shift occurs after DCASE 2016 where this framework model achieves the state-of-the-art result in ASC tasks [1] and [2]. In this paper, we explored the use of harmonic and percussive source separation (HPSS) to split the audio into harmonic audio and percussive audio. Next, we curated 2 CNNs which tries to understand harmonic audio and percussive audio in their ‘natural form’, one specialized in extracting deep features in time biased domain and another specialized in extracting deep features in frequency biased domain, respectively. The deep features extracted from these 2 CNNs will then be combined using bilinear pooling. Hence, presenting a ‘two-stream’ time and frequency CNN architecture approach in classifying acoustic scene. The model is being evaluated on DCASE 2019 sub task 1a dataset and scored an average of ~65% on development dataset, Kaggle Leadership Private and Public board.
Author(s): Kek XY, Chin CS, Li Y
Publication type: Conference Proceedings (inc. Abstract)
Publication status: Published
Conference Name: 2019 IEEE Symposium Series on Computational Intelligence
Year of Conference: 2019
Print publication date: 01/05/2019
Online publication date: 11/04/2019
Acceptance date: 10/09/2019
ISSN: 1556-6048
Publisher: IEEE
URL: https://doi.org/10.1109/MCI.2019.2901101
DOI: 10.1109/MCI.2019.2901101