PAC Algorithms for the Infinitely-Many Armed Problem with Multiple Pools

Nahum Shimkin, Yahel David

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

We consider a multi-pool version of the infinitely-many armed bandit problem, where a
learning agent is faced with several large pools of items, and interested in finding the best
item overall. At each time step the agent chooses a pool, and obtains a random item whose
value is precisely revealed. The obtained values within each pool are assumed to be i.i.d.,
with an unknown probability distribution that generally differs among the pools. Under the
PAC framework, we provide lower bounds on the sample complexity of any (, δ)-correct
algorithm, and propose an algorithm that attains this bound up to logarithmic factors. We
compare the performance of this multi-pool algorithm to the variant in which the pools are
not distinguishable by the agent and are chosen randomly at each stage. Interestingly, when
the supremal values of the pools happen to be similar, the latter approach may provide
better performance
Original languageAmerican English
Title of host publicationThe 12th European Workshop on Reinforcement Learning
StatePublished - 2015
EventThe 12th European Workshop on Reinforcement Learning - Lille, France
Duration: 10 Jul 201511 Jul 2015
Conference number: 12
https://ewrl.wordpress.com/past-ewrl/ewrl12-2015/

Conference

ConferenceThe 12th European Workshop on Reinforcement Learning
Abbreviated titleEWRL
Period10/07/1511/07/15
Internet address

Fingerprint

Dive into the research topics of 'PAC Algorithms for the Infinitely-Many Armed Problem with Multiple Pools'. Together they form a unique fingerprint.

Cite this