Abstract
This paper addresses the bandit game problem subject to privacy leakage, where the cooperative players aim to learn the optimal action profile that minimizes the global cost. The players do not have closed-form expressions for their payoff functions and can only receive the feedback of their local costs. We propose a privacy-preserving distributed bandit learning algorithm based on the residual gradient estimator, which adopts the stochastic quantization with a binary randomized response scheme to mask action profile estimates before communication. The theoretical analysis demonstrates that our algorithm can achieve an expected regret order of O(T3/4) and preserve εdp-differential privacy for the players.
Original language | English |
---|---|
Journal | IEEE Transactions on Automatic Control |
DOIs | |
State | Accepted/In press - 2025 |
Keywords
- Bandit games
- cooperative optimization
- privacy preservation
- stochastic quantization
All Science Journal Classification (ASJC) codes
- Control and Systems Engineering
- Computer Science Applications
- Electrical and Electronic Engineering