Browse by author
Lookup NU author(s): Dr Husnain SheraziORCiD
This work is licensed under a Creative Commons Attribution 4.0 International License (CC BY 4.0).
© 2024 by the authors. Accurate 3D object detection is essential for autonomous driving, yet traditional LiDAR models often struggle with sparse point clouds. We propose perspective-aware hierarchical vision transformer-based LiDAR-camera fusion (PLC-Fusion) for 3D object detection to address this. This efficient, multi-modal 3D object detection framework integrates LiDAR and camera data for improved performance. First, our method enhances LiDAR data by projecting them onto a 2D plane, enabling the extraction of object perspective features from a probability map via the Object Perspective Sampling (OPS) module. It incorporates a lightweight perspective detector, consisting of interconnected 2D and monocular 3D sub-networks, to extract image features and generate object perspective proposals by predicting and refining top-scored 3D candidates. Second, it leverages two independent transformers—CamViT for 2D image features and LidViT for 3D point cloud features. These ViT-based representations are fused via the Cross-Fusion module for hierarchical and deep representation learning, improving performance and computational efficiency. These mechanisms enhance the utilization of semantic features in a region of interest (ROI) to obtain more representative point features, leading to a more effective fusion of information from both LiDAR and camera sources. PLC-Fusion outperforms existing methods, achieving a mean average precision (mAP) of 83.52% and 90.37% for 3D and BEV detection, respectively. Moreover, PLC-Fusion maintains a competitive inference time of 0.18 s. Our model addresses computational bottlenecks by eliminating the need for dense BEV searches and global attention mechanisms while improving detection range and precision.
Author(s): Mushtaq H, Deng X, Azhar F, Ali M, Sherazi HHR
Publication type: Article
Publication status: Published
Journal: Information
Year: 2024
Volume: 15
Issue: 11
Online publication date: 19/11/2024
Acceptance date: 14/11/2024
Date deposited: 09/12/2024
ISSN (electronic): 2078-2489
Publisher: MDPI
URL: https://doi.org/10.3390/info15110739
DOI: 10.3390/info15110739
Data Access Statement: The dataset created and examined in the present study can be accessed from the KITTI 3D object detection repository (https://www.cvlibs.net/datasets/kitti/eval_object. php?obj_benchmark=3d (accessed on 18 July 2023)).
Altmetrics provided by Altmetric