TY - JOUR
T1 - A note on performance limitations in bandit problems with side information
AU - Goldenshluger, Alexander
AU - Zeevi, Assaf
N1 - Funding Information: Manuscript received October 31, 2008; revised August 05, 2010; accepted August 05, 2010. Date of current version February 18, 2011. This work was supported in part by the NSF by Grant DMI-0447562 and in part by the US-Israel Binational Science Foundation (BSF) by Grant #2006075. A. Zeevi is with the Graduate School of Business, Columbia University, New York, NY 10027 USA (e-mail: [email protected]). A. Goldenshluger is with the Department of Statistics, Haifa University, Haifa 31905 Israel (e-mail: [email protected]). Communicated by A. Krzyzak, Associate Editor for Pattern Recognition, Statistical Learning, and Inference. Digital Object Identifier 10.1109/TIT.2011.2104450
PY - 2011/3
Y1 - 2011/3
N2 - We consider a sequential adaptive allocation problem which is formulated as a traditional two armed bandit problem but with one important modification: at each time step t, before selecting which arm to pull, the decision maker has access to a random variable χt which provides information on the reward in each arm. Performance is measured as the fraction of time an inferior arm (generating lower mean reward) is pulled. We derive a minimax lower bound that proves that in the absence of sufficient statistical "diversity" in the distribution of the covariate χ, a property that we shall refer to as lack of persistent excitation, no policy can improve on the best achievable performance in the traditional bandit problem without side information.
AB - We consider a sequential adaptive allocation problem which is formulated as a traditional two armed bandit problem but with one important modification: at each time step t, before selecting which arm to pull, the decision maker has access to a random variable χt which provides information on the reward in each arm. Performance is measured as the fraction of time an inferior arm (generating lower mean reward) is pulled. We derive a minimax lower bound that proves that in the absence of sufficient statistical "diversity" in the distribution of the covariate χ, a property that we shall refer to as lack of persistent excitation, no policy can improve on the best achievable performance in the traditional bandit problem without side information.
KW - Allocation rule
KW - inferior sampling rate
KW - lower bound
KW - side information
KW - two-armed bandit
UR - http://www.scopus.com/inward/record.url?scp=79951890373&partnerID=8YFLogxK
U2 - 10.1109/TIT.2011.2104450
DO - 10.1109/TIT.2011.2104450
M3 - Article
SN - 0018-9448
VL - 57
SP - 1707
EP - 1713
JO - IEEE Transactions on Information Theory
JF - IEEE Transactions on Information Theory
IS - 3
M1 - 5714284
ER -