TY - GEN
T1 - Stochastic bandits with pathwise constraints
AU - Avner, Orly
AU - Mannor, Shie
PY - 2011
Y1 - 2011
N2 - We consider the problem of stochastic bandits, with the goal of maximizing a reward while satisfying pathwise constraints. The motivation for this problem comes from cognitive radio networks, in which agents need to choose between different transmission profiles to maximize throughput under certain operational constraints such as limited average power. Stochastic bandits serve as a natural model for an unknown, stationary environment. We propose an algorithm, based on a steering approach, and analyze its regret with respect to the optimal stationary policy that knows the statistics of the different arms.
AB - We consider the problem of stochastic bandits, with the goal of maximizing a reward while satisfying pathwise constraints. The motivation for this problem comes from cognitive radio networks, in which agents need to choose between different transmission profiles to maximize throughput under certain operational constraints such as limited average power. Stochastic bandits serve as a natural model for an unknown, stationary environment. We propose an algorithm, based on a steering approach, and analyze its regret with respect to the optimal stationary policy that knows the statistics of the different arms.
UR - http://www.scopus.com/inward/record.url?scp=84860661203&partnerID=8YFLogxK
U2 - https://doi.org/10.1109/CDC.2011.6161093
DO - https://doi.org/10.1109/CDC.2011.6161093
M3 - منشور من مؤتمر
SN - 9781612848006
T3 - Proceedings of the IEEE Conference on Decision and Control
SP - 3862
EP - 3869
BT - 2011 50th IEEE Conference on Decision and Control and European Control Conference, CDC-ECC 2011
T2 - 2011 50th IEEE Conference on Decision and Control and European Control Conference, CDC-ECC 2011
Y2 - 12 December 2011 through 15 December 2011
ER -