Sub-sampling for multi-armed bandits

Akram Baransi, Odalric Ambrym Maillard, Shie Mannor

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

The stochastic multi-armed bandit problem is a popular model of the exploration/exploitation trade-off in sequential decision problems. We introduce a novel algorithm that is based on sub-sampling. Despite its simplicity, we show that the algorithm demonstrates excellent empirical performances against state-of-the-art algorithms, including Thompson sampling and KL-UCB. The algorithm is very flexible, it does need to know a set of reward distributions in advance nor the range of the rewards. It is not restricted to Bernoulli distributions and is also invariant under rescaling of the rewards. We provide a detailed experimental study comparing the algorithm to the state of the art, the main intuition that explains the striking results, and conclude with a finite-time regret analysis for this algorithm in the simplified two-arm bandit setting.

Original languageEnglish
Title of host publicationMachine Learning and Knowledge Discovery in Databases - European Conference, ECML PKDD 2014, Proceedings
Pages115-131
Number of pages17
EditionPART 1
DOIs
StatePublished - 2014
EventEuropean Conference on Machine Learning and Knowledge Discovery in Databases, ECML PKDD 2014 - Nancy, France
Duration: 15 Sep 201419 Sep 2014

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
NumberPART 1
Volume8724 LNAI

Conference

ConferenceEuropean Conference on Machine Learning and Knowledge Discovery in Databases, ECML PKDD 2014
Country/TerritoryFrance
CityNancy
Period15/09/1419/09/14

Keywords

  • Multi-armed Bandits
  • Reinforcement Learning
  • Sub-sampling

All Science Journal Classification (ASJC) codes

  • Theoretical Computer Science
  • General Computer Science

Fingerprint

Dive into the research topics of 'Sub-sampling for multi-armed bandits'. Together they form a unique fingerprint.

Cite this