Toggle Main Menu Toggle Search

Open Access padlockePrints

Fusing spatial and frequency features for compositional zero-shot image classification

Lookup NU author(s): Dr Shidong WangORCiD

Downloads


Licence

This work is licensed under a Creative Commons Attribution 4.0 International License (CC BY 4.0).


Abstract

© 2024 Elsevier LtdCompositional Zero-Shot Learning (CZSL) is a particular Zero-Shot Learning (ZSL) task that aims to utilize known concepts (e.g., states and objects) to identify novel state-object compositions for Image Classification. Previous works have primarily focused on disentangling concept compositions or exploring the complex interactions between the states and objects while neglecting the critical fact that the inference of many states and compositions is related to different frequency components, which should be analyzed from the global perspective. Therefore, we propose a Spatial-frequency Feature Fusion Network (SFFNet) to introduce a new branch that utilizes a frequency-domain filtering encoder to enhance key frequency components and capture non-local interactions adaptively. Besides, we also find that the widely used backbone in conventional CZSL settings behaves superior in perceiving local features. Thus, we construct a fusion block to combine both strengths to capture the local and non-local information. In addition, the traditional one-hot ground-truth distribution in the training phase does not reflect the accurate relationships between compositions, so we propose a composition-relation based label distribution regularization to encourage the model to actively learn the inner relationships between compositions, and extend this method to construct unseen composition pseudo distribution to further enhance the model's generalization ability to unseen compositions. Extensive experiments and detailed analysis are conducted on three popular datasets, and the results show that our method can achieve state-of-the-art performance, which reveals its superiority in identifying novel compositions. Code is available at https://github.com/lisuyi/SFFNet_czsl.


Publication metadata

Author(s): Li S, Jiang C, Ye Q, Wang S, Yang W, Zhang H

Publication type: Article

Publication status: Published

Journal: Expert Systems with Applications

Year: 2024

Volume: 258

Print publication date: 15/12/2024

Online publication date: 28/08/2024

Acceptance date: 12/08/2024

Date deposited: 08/11/2024

ISSN (print): 0957-4174

ISSN (electronic): 1873-6793

Publisher: Elsevier Ltd

URL: https://doi.org/10.1016/j.eswa.2024.125230

DOI: 10.1016/j.eswa.2024.125230

ePrints DOI: 10.57711/vzpp-wp04

Data Access Statement: The employed datasets are publicly available.


Altmetrics

Altmetrics provided by Altmetric


Funding

Funder referenceFunder name
Key Research and Development Plan of Jiangsu Province (Industry Foresight and Key Core Technology Project), China under the Grant BE2023008-2
National Natural Science Foundation of China under the Grants 62371235 and 62072246

Share