关键词:
无人机
追逃微分博弈
多智能体强化学习
MASAC
摘要:
本文针对传统追逃微分博弈模型在现实复杂环境下,特别是面对不完全信息和计算复杂度时求解困难的问题,创新性提出了一种基于柔性执行者–评论家(Soft Actor-Critic, SAC)算法的改进多智能体强化学习方法,应用于无人机追捕单一智能目标的微分博弈问题。SAC算法在追逃微分博弈中的优势体现在其自然实现了混合策略的概念,能够通过随机性来应对对手的动态变化,且具有较强的探索能力、稳定性和鲁棒性。与其他强化学习算法相比,SAC更适合处理不确定性强、对手行为复杂、动作空间连续的博弈问题。本文假设在部分可观测的环境下,追逐者和逃避者均无法知晓全部信息,仅能通过环境中的部分信息进行决策。为了解决这一连续优化问题,本文采用多智能体柔性执行者–评论家(multi-agent Soft Actor-Critic, MASAC)算法,使追逃双方智能体通过与环境的交互学习各自的最优策略。最终,本文通过测试展示了在部分可观测环境下,改进的多智能体强化学习方法在无人机追捕–逃避场景中的适用性与应用潜力。This paper addresses the difficulty in solving traditional pursuit-evasion differential game models in complex real-world environments, especially when dealing with incomplete information and computational complexity. An innovative solution is proposed in the form of an improved multi-agent reinforcement learning method based on the Soft Actor-Critic (SAC) algorithm, applied to the differential game problem of unmanned aerial vehicles (UAVs) pursuing a single intelligent target. The advantage of the SAC algorithm in pursuit-evasion differential games lies in its natural implementation of the mixed strategy concept, allowing it to handle dynamic changes in the opponent’s behavior through randomness, while exhibiting strong exploration capabilities, stability, and robustness. Compared to other reinforcement learning algorithms, SAC is better suited for handling games with strong uncertainty, complex opponent behaviors, and continuous action spaces. In this paper, we assume a partially observable environment where both the pursuer and evader are unaware of the full information and can only make decisions based on partial environmental observations. To address this continuous optimization problem, we adopt the multi-agent Soft Actor-Critic (MASAC) algorithm, enabling both agents in the pursuit-evasion scenario to learn their optimal strategies through interactions with the environment. Ultimately, through testing, this paper demonstrates the applicability and potential of the improved multi-agent reinforcement learning method in UA