TY - GEN
T1 - Reinforcement-Learning-Based Cooperative Dynamic Weapon-Target Assignment in a Multiagent Engagement
AU - Merkulov, Gleb
AU - Iceland, Eran
AU - Michaeli, Shay
AU - Gal, Oren
AU - Barel, Ariel
AU - Shima, Tal
N1 - Publisher Copyright: © 2025, American Institute of Aeronautics and Astronautics Inc, AIAA. All rights reserved.
PY - 2025
Y1 - 2025
N2 - This paper considers a multiagent Shoot-Shoot-Look engagement scenario, in which multiple pursuers act against multiple evaders with predefined motion. The pursuers are arranged in two successive waves - the first wave engages the a-priori allocated evaders directly, and the second wave trails behind to assist with the pursuit of the evaders that survive the firstwave. The scenario objective is to maximize the number of intercepted evaders under the given time constraints. To facilitate the intercept performance, the second-wave pursuers are guided to intermediate virtual targets that allow several reallocation options when the actual first-wave engagement outcomes are known. The choice of the virtual targets influences the time and maneuver required from the pursuer during the engagement, which are related to the intercept probabilities. We formulate the problem of second-wave allocation as a stochastic Markov Decision Process. Due to the special problem structure, a reward expression based on the predictions of intercept probabilities is developed. Using this reward function, a Reinforcement-Learning-based strategy is proposed for the virtual target allocation for the second-wave pursuers. An alternative Greedy algorithm is designed based on maximizing individual contribution increments into intercept probabilities. Sequential decentralized decision-making architecture is used to implement both approaches. The simulation demonstrate that the Reinforcement-Learning-based solution wins slightly over Greedy, and the proposed greedy heuristic approximates well the Reinforcement-Learning solution.
AB - This paper considers a multiagent Shoot-Shoot-Look engagement scenario, in which multiple pursuers act against multiple evaders with predefined motion. The pursuers are arranged in two successive waves - the first wave engages the a-priori allocated evaders directly, and the second wave trails behind to assist with the pursuit of the evaders that survive the firstwave. The scenario objective is to maximize the number of intercepted evaders under the given time constraints. To facilitate the intercept performance, the second-wave pursuers are guided to intermediate virtual targets that allow several reallocation options when the actual first-wave engagement outcomes are known. The choice of the virtual targets influences the time and maneuver required from the pursuer during the engagement, which are related to the intercept probabilities. We formulate the problem of second-wave allocation as a stochastic Markov Decision Process. Due to the special problem structure, a reward expression based on the predictions of intercept probabilities is developed. Using this reward function, a Reinforcement-Learning-based strategy is proposed for the virtual target allocation for the second-wave pursuers. An alternative Greedy algorithm is designed based on maximizing individual contribution increments into intercept probabilities. Sequential decentralized decision-making architecture is used to implement both approaches. The simulation demonstrate that the Reinforcement-Learning-based solution wins slightly over Greedy, and the proposed greedy heuristic approximates well the Reinforcement-Learning solution.
UR - http://www.scopus.com/inward/record.url?scp=105001415979&partnerID=8YFLogxK
U2 - 10.2514/6.2025-1546
DO - 10.2514/6.2025-1546
M3 - Conference contribution
SN - 9781624107238
T3 - AIAA Science and Technology Forum and Exposition, AIAA SciTech Forum 2025
BT - AIAA Science and Technology Forum and Exposition, AIAA SciTech Forum 2025
T2 - AIAA Science and Technology Forum and Exposition, AIAA SciTech Forum 2025
Y2 - 6 January 2025 through 10 January 2025
ER -