DS-YOLOv5：一種實時的安全帽佩戴檢測與識別模型

白培瑞; 王瑞; 劉慶一; 韓超; 杜紅萱; 軒轅夢玉; 傅穎霞

doi:10.13374/j.issn2095-9389.2022.11.11.006

摘要: 基于視頻分析技術對生產現場人員安全帽佩戴情況進行自動檢測與識別是保障安全生產的重要手段. 但是，復雜的現場環境和多變的外界因素為安全帽檢測與識別的精確性提出挑戰. 本文基于YOLOv5模型的框架，提出一種DS-YOLOv5安全帽檢測與識別模型. 首先，利用改進的Deep SORT多目標跟蹤的優勢，提高視頻檢測中多目標檢測和有遮擋的容錯率，減少漏檢情況；其次在主干網絡中融合簡化的Transformer模塊，加強對圖像的全局信息的捕獲進而加強對小目標的特征學習；最后在網絡的Neck部分應用雙向特征金字塔網絡(BiFPN)融合多尺度特征，以便適應由攝影距離造成的目標尺度變化. 所提模型在GDUT-HWD和MOT多目標跟蹤數據集上進行了驗證實驗，結果表明DS-YOLOv5模型可以更好地適應遮擋和目標尺度變化，全類平均精度（mAP）可以達到95.5%，優于其他常見的安全帽檢測與識別方法.

Abstract: Automatic detection and recognition of safety helmet wearing based on video analysis is important to ensure production safety. It is inefficient to supervise whether workers wear safety helmets by manual means. With the advancement of deep learning, using computer vision to assist in the detection of safety helmet-wearing holds significant research and application value. However, complex environments and variable factors pose challenges in achieving accurate detection and recognition of safety helmet usage. Helmet-wearing detection methods are generally classified as traditional machine learning and deep learning methods. Traditional machine learning methods employ manually selected features or statistical features, resulting in poor model stability. Deep learning–based methods are further divided into “two-stage” and “one-stage” methods. The two-stage method has high detection accuracy but cannot achieve real-time detection, while the one-stage counterpart is the reverse. Achieving accuracy as well as real-time detection is an important challenge in the development of video-based helmet detection systems. Accurate and quick detection of helmets is essential for effective real-time monitoring of production sites. To address these challenges, this paper proposes DS-YOLOv5—a real-time helmet detection and recognition model based on the YOLOv5 models. The proposed model solves three main problems: First, insufficient global information extraction problem of convolutional neural network (CNN) models. Second, the lacking robustness of the deep SORT for multiple targets and occlusion problems in video scenes. Third, the inadequate feature extraction of multiscale targets. The DS-YOLOv5 model takes advantage of the improved Deep SORT multitarget tracking algorithm to reduce the rate of missed detections in multitarget detection and occlusion and increase the error tolerance in video detection. Further, a simplified transformer module is integrated into the backbone network to enhance the capture of global information from images and thus enhance feature learning for small targets. Finally, the bidirectional feature pyramid network is used to fuse multiscale features, which can better adapt to target scale changes caused by the photographic distance. The DS-YOLOv5 model was validated using the GDUT-HWD dataset by ablation and comparison experiments. In these experiments, the tracking capability of the improved Deep SORT is compared with the YOLOv5 model using the public pedestrian dataset MOT. The results of the comparison of the five one-stage methods and four helmet detection and recognition models revealed that the proposed model has better capability for dealing with occlusion and target scale. Further, the model achieved mean average orecision (mAP) of 95.5%, which is superior to that of the other helmet detection and recognition models.

DS-YOLOv5：一種實時的安全帽佩戴檢測與識別模型

DS-YOLOv5: A real-time detection and recognition model for helmet wearing