TY - GEN
T1 - Infinitely many-armed bandits with unknown value distribution
AU - David, Yahel
AU - Shimkin, Nahum
PY - 2014
Y1 - 2014
N2 - We consider a version of the classical stochastic Multi-Armed bandit problem in which the number of arms is large compared to the time horizon, with the goal of minimizing the cumulative regret. Here, the mean-reward (or value) of newly chosen arms is assumed to be i.i.d. We further make the simplifying assumption that the value of an arm is revealed once this arm is chosen. We present a general lower bound on the regret, and learning algorithms that achieve this bound up to a logarithmic factor. Contrary to previous work, we do not assume that the functional form of the tail of the value distribution is known. Furthermore, we also consider a variant of our model where sampled arms are non-retainable, namely are lost if not used continuously, with similar near-optimality results.
AB - We consider a version of the classical stochastic Multi-Armed bandit problem in which the number of arms is large compared to the time horizon, with the goal of minimizing the cumulative regret. Here, the mean-reward (or value) of newly chosen arms is assumed to be i.i.d. We further make the simplifying assumption that the value of an arm is revealed once this arm is chosen. We present a general lower bound on the regret, and learning algorithms that achieve this bound up to a logarithmic factor. Contrary to previous work, we do not assume that the functional form of the tail of the value distribution is known. Furthermore, we also consider a variant of our model where sampled arms are non-retainable, namely are lost if not used continuously, with similar near-optimality results.
UR - http://www.scopus.com/inward/record.url?scp=84907009837&partnerID=8YFLogxK
U2 - 10.1007/978-3-662-44848-9_20
DO - 10.1007/978-3-662-44848-9_20
M3 - منشور من مؤتمر
SN - 9783662448472
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 307
EP - 322
BT - Machine Learning and Knowledge Discovery in Databases - European Conference, ECML PKDD 2014, Proceedings
T2 - European Conference on Machine Learning and Knowledge Discovery in Databases, ECML PKDD 2014
Y2 - 15 September 2014 through 19 September 2014
ER -