A note on performance limitations in bandit problems with side information

Research output: Contribution to journalArticlepeer-review

Abstract

We consider a sequential adaptive allocation problem which is formulated as a traditional two armed bandit problem but with one important modification: at each time step t, before selecting which arm to pull, the decision maker has access to a random variable χt which provides information on the reward in each arm. Performance is measured as the fraction of time an inferior arm (generating lower mean reward) is pulled. We derive a minimax lower bound that proves that in the absence of sufficient statistical "diversity" in the distribution of the covariate χ, a property that we shall refer to as lack of persistent excitation, no policy can improve on the best achievable performance in the traditional bandit problem without side information.

Original languageAmerican English
Article number5714284
Pages (from-to)1707-1713
Number of pages7
JournalIEEE Transactions on Information Theory
Volume57
Issue number3
DOIs
StatePublished - Mar 2011

Keywords

  • Allocation rule
  • inferior sampling rate
  • lower bound
  • side information
  • two-armed bandit

All Science Journal Classification (ASJC) codes

  • Information Systems
  • Computer Science Applications
  • Library and Information Sciences

Fingerprint

Dive into the research topics of 'A note on performance limitations in bandit problems with side information'. Together they form a unique fingerprint.

Cite this