Skip to main navigation Skip to search Skip to main content

Decoupling exploration and exploitation in multi-armed bandits

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

We consider a multi-armed bandit problem where the decision maker can explore and exploit different arms at every round. The exploited arm adds to the decision maker's cumulative reward (without necessarily observing the reward) while the explored arm reveals its value. We devise algorithms for this setup and show that the dependence on the number of arms, k, can be much better than the standard √k dependence, depending on the behavior of the arms' reward sequences. For the important case of piecewise stationary stochastic bandits, we show a significant improvement over existing algorithms. Our algorithms are based on a non-uniform sampling policy, which we show is essential to the success of any algorithm in the adversarial setup. Finally, we show some simulation results on an ultra-wide band channel selection inspired setting indicating the applicability of our algorithms.

Original languageEnglish
Title of host publicationProceedings of the 29th International Conference on Machine Learning, ICML 2012
Pages409-416
Number of pages8
StatePublished - 2012
Event29th International Conference on Machine Learning, ICML 2012 - Edinburgh, United Kingdom
Duration: 26 Jun 20121 Jul 2012

Publication series

NameProceedings of the 29th International Conference on Machine Learning, ICML 2012
Volume1

Conference

Conference29th International Conference on Machine Learning, ICML 2012
Country/TerritoryUnited Kingdom
CityEdinburgh
Period26/06/121/07/12

All Science Journal Classification (ASJC) codes

  • Human-Computer Interaction
  • Education

Fingerprint

Dive into the research topics of 'Decoupling exploration and exploitation in multi-armed bandits'. Together they form a unique fingerprint.

Cite this