Infinitely many-armed bandits with unknown value distribution

Yahel David, Nahum Shimkin

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

We consider a version of the classical stochastic Multi-Armed bandit problem in which the number of arms is large compared to the time horizon, with the goal of minimizing the cumulative regret. Here, the mean-reward (or value) of newly chosen arms is assumed to be i.i.d. We further make the simplifying assumption that the value of an arm is revealed once this arm is chosen. We present a general lower bound on the regret, and learning algorithms that achieve this bound up to a logarithmic factor. Contrary to previous work, we do not assume that the functional form of the tail of the value distribution is known. Furthermore, we also consider a variant of our model where sampled arms are non-retainable, namely are lost if not used continuously, with similar near-optimality results.

Original languageEnglish
Title of host publicationMachine Learning and Knowledge Discovery in Databases - European Conference, ECML PKDD 2014, Proceedings
Pages307-322
Number of pages16
EditionPART 1
DOIs
StatePublished - 2014
EventEuropean Conference on Machine Learning and Knowledge Discovery in Databases, ECML PKDD 2014 - Nancy, France
Duration: 15 Sep 201419 Sep 2014

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
NumberPART 1
Volume8724 LNAI

Conference

ConferenceEuropean Conference on Machine Learning and Knowledge Discovery in Databases, ECML PKDD 2014
Country/TerritoryFrance
CityNancy
Period15/09/1419/09/14

All Science Journal Classification (ASJC) codes

  • Theoretical Computer Science
  • General Computer Science

Fingerprint

Dive into the research topics of 'Infinitely many-armed bandits with unknown value distribution'. Together they form a unique fingerprint.

Cite this