基于強化學習的多無人車協同圍捕方法

蘇牧青; 王寅; 濮銳敏; 余萌

doi:10.13374/j.issn2095-9389.2023.09.15.004

摘要: 本文面向無人車協同圍捕問題開展研究，提出了一種基于柔性執行者?評論家（SAC）算法框架的協同圍捕算法. 針對多無人車之間的協同性差的問題，在網絡結構中加入長短期記憶(LSTM)構建記憶功能，幫助無人車利用歷史觀測序列進行更穩健的決策；針對網絡結構中引入LSTM所導致的狀態空間維度增大、效率低的問題，提出引入注意力機制，通過對狀態空間進行注意力權重的計算和選擇，將注意力集中在與任務相關的關鍵狀態上，從而約束狀態空間維度并保證網絡的穩定性，實現多無人車之間穩定高效的合作并提高算法的訓練效率. 為解決協同圍捕任務中獎勵稀疏的問題，提出通過混合獎勵函數將獎勵函數分為個體獎勵和協同獎勵，通過引入個體獎勵和協同獎勵，無人車在圍捕過程中可以獲得更頻繁的獎勵信號. 個體獎勵通過引導無人車向目標靠近來激勵其運動行為，而協同獎勵則激勵群體無人車共同完成圍捕任務，從而進一步提高算法的收斂速度. 最后，通過仿真和實驗表明，該方法具有更快的收斂速度，相較于SAC算法，圍捕時間縮短15.1%，成功率提升7.6%.

Abstract: Collaborative encirclement of multiple unmanned ground vehicles (UGVs) is a focal challenge in the realm of multiagent collaborative tasks, representing a fundamental issue in complex undertakings such as multiagent collaborative search and interception. Although optimization algorithms have yielded rich research outcomes in collaborative encirclement, challenges persist, including poor real-time computational efficiency and weak robustness. Reinforcement learning theory holds considerable promise for addressing multiagent sequential decision problems. This paper delves into the study of the collaborative encirclement of multiple UGVs based on deep reinforcement learning theory, focusing on the following key aspects: establishing a kinematic model for UGVs to describe the collaborative encirclement task, detailing the collaborative encirclement process, developing strategies for target UGV escape, and addressing challenges arising from the increasing number of UGVs, which results in a complex environment and issues such as algorithmic instability, dimension explosion, and poor convergence. This paper introduces a collaborative encirclement algorithm based on the soft actor–critic (SAC) framework. To address issues related to poor collaboration and weak generalization among multiple UGVs, long short-term memory is incorporated into the network structure, serving as a memory function for UGVs. This tactic aids in capturing and using information from historical observation sequences, effectively processing time–series data, making more accurate decisions, promoting mutual collaboration among UGVs, and enhancing system stability. To tackle the issue of increased state space dimensions and low training efficiency during collaborative encirclement, an attention mechanism is introduced to calculate and select attention weights in the state space, focusing attention on key states relevant to the task. This strategy helps constrain state space dimensions, ensuring network stability, achieving stable and efficient collaboration among multiple UGVs, and improving algorithm training efficiency. To address the problem of sparse rewards in collaborative encirclement tasks, a mixed reward function is proposed that divides the reward function into individual and collaborative rewards. Individual rewards guide UGVs toward the target, incentivizing their motion behavior, whereas collaborative rewards motivate a group of UGVs to collectively accomplish the encirclement task. This approach further guides UGVs to obtain more frequent reward signals, ultimately enhancing the algorithm convergence speed. Simulation and experimental results demonstrate that the proposed method achieves faster convergence than SAC, with a 15.1% reduction in encirclement time and a 7.6% improvement in success rate. Finally, the improved algorithm developed in this paper is deployed on a UGV platform, and real-world experiments in typical encirclement scenarios validate its feasibility and effectiveness in embedded systems.

基于強化學習的多無人車協同圍捕方法

Cooperative encirclement method for multiple unmanned ground vehicles based on reinforcement learning