Find in Library
Search millions of books, articles, and more
Indexed Open Access Databases
Drone-Based Visible–Thermal Object Detection with Transformers and Prompt Tuning
oleh: Rui Chen, Dongdong Li, Zhinan Gao, Yangliu Kuai, Chengyuan Wang
Format: | Article |
---|---|
Diterbitkan: | MDPI AG 2024-09-01 |
Deskripsi
The use of unmanned aerial vehicles (UAVs) for visible–thermal object detection has emerged as a powerful technique to improve accuracy and resilience in challenging contexts, including dim lighting and severe weather conditions. However, most existing research relies on Convolutional Neural Network (CNN) frameworks, limiting the application of the Transformer’s attention mechanism to mere fusion modules and neglecting its potential for comprehensive global feature modeling. In response to this limitation, this study introduces an innovative dual-modal object detection framework called <b>Vi</b>sual <b>P</b>rompt multi-modal <b>Det</b>ection (<b>VIP-Det</b>) that harnesses the Transformer architecture as the primary feature extractor and integrates vision prompts for refined feature fusion. Our approach begins with the training of a single-modal baseline model to solidify robust model representations, which is then refined through fine-tuning that incorporates additional modal data and prompts. Tests on the DroneVehicle dataset show that our algorithm achieves remarkable accuracy, outperforming comparable Transformer-based methods. These findings indicate that our proposed methodology marks a significant advancement in the realm of UAV-based object detection, holding significant promise for enhancing autonomous surveillance and monitoring capabilities in varied and challenging environments.