Toggle Main Menu Toggle Search

Open Access padlockePrints

PLC-Fusion: Perspective-Based Hierarchical and Deep LiDAR Camera Fusion for 3D Object Detection in Autonomous Vehicles

Lookup NU author(s): Dr Husnain SheraziORCiD

Downloads


Licence

This work is licensed under a Creative Commons Attribution 4.0 International License (CC BY 4.0).


Abstract

© 2024 by the authors. Accurate 3D object detection is essential for autonomous driving, yet traditional LiDAR models often struggle with sparse point clouds. We propose perspective-aware hierarchical vision transformer-based LiDAR-camera fusion (PLC-Fusion) for 3D object detection to address this. This efficient, multi-modal 3D object detection framework integrates LiDAR and camera data for improved performance. First, our method enhances LiDAR data by projecting them onto a 2D plane, enabling the extraction of object perspective features from a probability map via the Object Perspective Sampling (OPS) module. It incorporates a lightweight perspective detector, consisting of interconnected 2D and monocular 3D sub-networks, to extract image features and generate object perspective proposals by predicting and refining top-scored 3D candidates. Second, it leverages two independent transformers—CamViT for 2D image features and LidViT for 3D point cloud features. These ViT-based representations are fused via the Cross-Fusion module for hierarchical and deep representation learning, improving performance and computational efficiency. These mechanisms enhance the utilization of semantic features in a region of interest (ROI) to obtain more representative point features, leading to a more effective fusion of information from both LiDAR and camera sources. PLC-Fusion outperforms existing methods, achieving a mean average precision (mAP) of 83.52% and 90.37% for 3D and BEV detection, respectively. Moreover, PLC-Fusion maintains a competitive inference time of 0.18 s. Our model addresses computational bottlenecks by eliminating the need for dense BEV searches and global attention mechanisms while improving detection range and precision.


Publication metadata

Author(s): Mushtaq H, Deng X, Azhar F, Ali M, Sherazi HHR

Publication type: Article

Publication status: Published

Journal: Information

Year: 2024

Volume: 15

Issue: 11

Online publication date: 19/11/2024

Acceptance date: 14/11/2024

Date deposited: 09/12/2024

ISSN (electronic): 2078-2489

Publisher: MDPI

URL: https://doi.org/10.3390/info15110739

DOI: 10.3390/info15110739

Data Access Statement: The dataset created and examined in the present study can be accessed from the KITTI 3D object detection repository (https://www.cvlibs.net/datasets/kitti/eval_object. php?obj_benchmark=3d (accessed on 18 July 2023)).


Altmetrics

Altmetrics provided by Altmetric


Funding

Funder referenceFunder name
Key Project of Shenzhen City Special Fund for Fundamental Research (202208183000751)
Local Science and Technology Developing Foundation Guided by the Central Government of China (Free Exploration project 2021Szvup166)
National Natural Science Foundation of China Project (62172441, 62172449)
National Natural Science Foundation of Hunan Province (2023JJ30696)
Opening Project of State Key Laboratory of Nickel and Cobalt Resources Comprehensive Utilization (GZSYS-KY-2022-018, GZSYS-KY-2022-024)

Share