TY - GEN
T1 - Restless Hidden Markov Bandit with Linear Rewards
AU - Yemini, Michal
AU - Leshem, Amir
AU - Somekh-Baruch, Anelia
N1 - Publisher Copyright: © 2020 IEEE.
PY - 2020/12/14
Y1 - 2020/12/14
N2 - This paper presents an algorithm and regret analysis for the restless hidden Markov bandit problem with linear rewards. In this problem the reward received by the decision maker is a random linear function which depends on the arm selected and a hidden state. In contrast to previous works on Markovian bandits, we do not assume that the decision maker receives information regarding the state of the system, but can only infer/estimate it based on its actions and the received reward. Additionally, it is assumed that the decision maker knows in advance that the reward is a random linear function which depends on the selected arm, the action, and hidden states. However, the decision maker does not know in advance the probability distributions of these hidden states; thus we call this side information structural side information. Surprisingly, we can still maintain logarithmic regret in the case of polyhedral action set. Furthermore, we show that the structural side information leads to expected regret that does not depend on the number of extreme points in the action space.
AB - This paper presents an algorithm and regret analysis for the restless hidden Markov bandit problem with linear rewards. In this problem the reward received by the decision maker is a random linear function which depends on the arm selected and a hidden state. In contrast to previous works on Markovian bandits, we do not assume that the decision maker receives information regarding the state of the system, but can only infer/estimate it based on its actions and the received reward. Additionally, it is assumed that the decision maker knows in advance that the reward is a random linear function which depends on the selected arm, the action, and hidden states. However, the decision maker does not know in advance the probability distributions of these hidden states; thus we call this side information structural side information. Surprisingly, we can still maintain logarithmic regret in the case of polyhedral action set. Furthermore, we show that the structural side information leads to expected regret that does not depend on the number of extreme points in the action space.
UR - http://www.scopus.com/inward/record.url?scp=85099877827&partnerID=8YFLogxK
U2 - https://doi.org/10.1109/cdc42340.2020.9304511
DO - https://doi.org/10.1109/cdc42340.2020.9304511
M3 - منشور من مؤتمر
T3 - Proceedings of the IEEE Conference on Decision and Control
SP - 1183
EP - 1189
BT - 2020 59th IEEE Conference on Decision and Control, CDC 2020
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 59th IEEE Conference on Decision and Control, CDC 2020
Y2 - 14 December 2020 through 18 December 2020
ER -