Committing bandits

Loc Bui, Ramesh Johari, Shie Mannor

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

We consider a multi-armed bandit problem where there are two phases. The first phase is an experimentation phase where the decision maker is free to explore multiple options. In the second phase the decision maker has to commit to one of the arms and stick with it. Cost is incurred during both phases with a higher cost during the experimentation phase. We analyze the regret in this setup, and both propose algorithms and provide upper and lower bounds that depend on the ratio of the duration of the experimentation phase to the duration of the commitment phase. Our analysis reveals that if given the choice, it is optimal to experiment Θ(ln T ) steps and then commit, where T is the time horizon.

Original languageEnglish
Title of host publicationAdvances in Neural Information Processing Systems 24
Subtitle of host publication25th Annual Conference on Neural Information Processing Systems 2011, NIPS 2011
StatePublished - 2011
Event25th Annual Conference on Neural Information Processing Systems 2011, NIPS 2011 - Granada, Spain
Duration: 12 Dec 201114 Dec 2011

Publication series

NameAdvances in Neural Information Processing Systems 24: 25th Annual Conference on Neural Information Processing Systems 2011, NIPS 2011

Conference

Conference25th Annual Conference on Neural Information Processing Systems 2011, NIPS 2011
Country/TerritorySpain
CityGranada
Period12/12/1114/12/11

All Science Journal Classification (ASJC) codes

  • Information Systems

Fingerprint

Dive into the research topics of 'Committing bandits'. Together they form a unique fingerprint.

Cite this