基于深度強化學習的無人機集群數字孿生編隊避障

張宇宸; 段海濱; 魏晨

doi:10.13374/j.issn2095-9389.2023.09.28.005

基于深度強化學習的無人機集群數字孿生編隊避障

Digital twin-based obstacle avoidance method for unmanned aerial vehicle formation control using deep reinforcement learning

摘要

摘要: 無人機集群在各個領域中扮演著重要角色，具有豐富的應用場景。然而，將深度強化學習方法應用于自主無人機面臨著諸多嚴峻挑戰。本文基于多智能體深度強化學習，通過使用局部信息建立單個無人機的狀態空間，并使用多智能體近端策略優化（Multi-agent proximal policy optimization，MAPPO）的在線策略算法來訓練策略網絡，從而克服了環境的不確定性和對全局信息的依賴. 同時，引入了數字孿生的概念，為資源緊張型算法提供了新思路. 為了解決采樣困難和資源緊張的問題，基于數字孿生技術，構建了一個用于無人機編隊避障策略模型訓練的架構. 首先，構建了多個數字孿生環境，用于強化學習算法在任務開始之前進行交互采樣的預訓練，以使集群具備基本的任務能力. 然后，使用在真實環境中采集的數據進行補充訓練，使得集群能夠更好地完成任務. 對采用這種兩階段訓練架構的效果進行了對比，同時與其他策略算法進行比較，驗證了MAPPO的樣本效率性能. 最后，設計了實際飛行驗證測試，驗證了從孿生環境中獲得的策略模型的實用性和可靠性.

Abstract: Unmanned aerial vehicle (UAV) swarms have found extensive applications in various fields, playing a crucial role in cluster collaboration. These swarms involve multiple UAVs that work together to achieve common objectives. A key challenging task in swarm operations is collision-free formation control of UAVs. To solve this problem, applying deep reinforcement learning methods has received significant attention, but their application on autonomous UAVs poses challenges, including dependency on global information during training, difficulties in sampling, and excessive resource utilization. To overcome these challenges, in this work, a novel approach based on multi-agent deep reinforcement learning (MARL) is proposed for collision-free formation control of UAV swarms. MARL allows each UAV to interact with a dynamic environment that includes other UAVs, enabling collaborative decision-making and adaptive behavior. We focus on leveraging local information to establish a state space for individual UAVs. To train the policy network, we employ the multi-agent proximal policy optimization (MAPPO) algorithm, allowing robust learning and policy optimization in a multi-agent setting. Also, we address the issues of sampling difficulties and resource constraints by utilizing digital twin technology, serving as a bridge between physical entities and virtual models, which offers a novel approach to the intelligent collaborative control of drone swarms. By establishing models in virtual space, digital twin technology enables the simulation of real-world spaces for pre-training the reinforcement learning algorithm by generating synthetic experiences. We construct multiple digital twin environments to facilitate interactive sampling and pre-train the swarm with basic task capabilities. Then, we supplement the training using real-world data collected in actual environments, enhancing the ability of the swarm to perform optimally in real-world scenarios. To evaluate the effectiveness of our approach, we compare the performance of the two-stage training architecture with other policy algorithms. To validate the sample efficiency of the on-policy algorithm MAPPO, we conducted a comparative analysis with other policy algorithms, particularly off-policy algorithms. The results reveal the superior sample efficiency and stability of MAPPO in addressing the challenges of collision-free formation control. Finally, we conduct a real-flight validation test to validate the practicality and reliability of the strategy model derived from the digital twin environments. Overall, this work demonstrates the effectiveness of our proposed approach in enabling UAV swarms to navigate complex environments and achieve collision-free formation control.

HTML全文

參考文獻(25)

施引文獻

資源附件(0)