Abstract
Coalition formation concerns autonomous agents that strategically interact to form self-organized coalitions. When agents lack initial sufficient information to evaluate their preferences before interacting with others, they learn them online through repeated feedback while iteratively forming coalitions. In this work, we introduce online learning in coalition formation from a non-cooperative perspective, studying the impact of collective data utilization where selfish agents aim to accelerate their learning by leveraging a shared data platform. Thus, the efficiency and dynamics of the learning process are affected by each agent’s local feedbacks, motivating us to explore the tension between semi-bandit and bandit feedback, which differ in the granularity of utility information observed by each agent. Under our non-cooperative viewpoint, we evaluate the system by means of Nash stability, where no agent can improve her utility by unilaterally deviating. Our main result is a sample-efficient algorithm for selfish agents that aims to minimize their Nash regret under both semi-bandit and bandit feedback, implying approximately Nash stable outcomes. Under both feedback settings, our algorithm enjoys Nash regret and sample complexity bounds that are optimal up to logarithmic factors.
Original language | English |
---|---|
Pages (from-to) | 13709-13717 |
Number of pages | 9 |
Journal | Proceedings of the AAAI Conference on Artificial Intelligence |
Volume | 39 |
Issue number | 13 |
DOIs | |
State | Published - 11 Apr 2025 |
Event | 39th Annual AAAI Conference on Artificial Intelligence, AAAI 2025 - Philadelphia, United States Duration: 25 Feb 2025 → 4 Mar 2025 |
All Science Journal Classification (ASJC) codes
- Artificial Intelligence