PAC lower bounds and efficient algorithms for the Max K-Armed Bandit problem

Yahel David, Nahum Shimkin

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

We consider the Max K-Armed Bandit problem, where a learning agent is faced with several stochastic arms, each a source of i.i.d. rewards of unknown distribution. At each time step the agent chooses an arm, and observes the reward of the obtained sample. Each sample is considered here as a separate item with the reward designating its value, and the goal is to find an item with the highest possible value. Our basic assumption is a known lower bound on the tail function of the reward distributions. Under the PAC framework, we provide a lower bound on the sample complexity of any (∈,δ)-correct algorithm, and propose an algorithm that attains this bound up to logarithmic factors. We provide an analysis of the robustness of the proposed algorithm to the model assumptions, and further compare its performance to the simple non-adaptive variant, in which the arms are chosen randomly at each stage.

Original languageEnglish
Title of host publication33rd International Conference on Machine Learning, ICML 2016
EditorsMaria Florina Balcan, Kilian Q. Weinberger
Pages1389-1401
Number of pages13
ISBN (Electronic)9781510829008
StatePublished - 2016
Event33rd International Conference on Machine Learning, ICML 2016 - New York City, United States
Duration: 19 Jun 201624 Jun 2016

Publication series

Name33rd International Conference on Machine Learning, ICML 2016
Volume2

Conference

Conference33rd International Conference on Machine Learning, ICML 2016
Country/TerritoryUnited States
CityNew York City
Period19/06/1624/06/16

All Science Journal Classification (ASJC) codes

  • Software
  • Artificial Intelligence
  • Computer Networks and Communications

Cite this