Abstract
In 5G networks, Device-To-Device (D2D) communications aim to provide dense coverage without relying on the cellular network infrastructure. To achieve this goal, the D2D links are expected to be capable of self-organizing and allocating finite, interfering resources with limited inter-link coordination. We consider a dense ad-hoc D2D network and propose a decentralized time-frequency allocation mechanism that achieves sub-linear social regret toward optimal spectrum efficiency. The proposed mechanism is constructed in the framework of multi-Agent multi-Armed bandits, which employs the carrier-sensing-based distributed auction to learn the optimal allocation of time-frequency blocks with different channel state dynamics from scratch. Our theoretical analysis shows that the proposed fully distributed mechanism achieves a logarithmic regret bound by adopting an epoch-based strategy-learning scheme when the length of the strategy-exploitation window is exponentially growing. We further propose an implementation-friendly protocol featuring a fixed exploitation window, which guarantees a good tradeoff between performance optimality and protocol efficiency. Numerical simulations demonstrate that the proposed protocol achieves higher efficiency than the prevalent reference algorithms in both static and dynamic wireless environments.
| Original language | English |
|---|---|
| Pages (from-to) | 3149-3163 |
| Number of pages | 15 |
| Journal | IEEE Transactions on Signal Processing |
| Volume | 71 |
| DOIs | |
| State | Published - 2023 |
Keywords
- D2D networks
- Multi-Agent multi-Armed bandit
- distributed network management
- resource allocation
All Science Journal Classification (ASJC) codes
- Signal Processing
- Electrical and Electronic Engineering
Fingerprint
Dive into the research topics of 'Distributed Learning for Optimal Spectrum Access in Dense Device-To-Device Ad-Hoc Networks'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver