Stochastic Multi-Armed Bandits with Unrestricted Delay Distributions

Tal Lancewicki, Shahar Segal, Tomer Koren, Yishay Mansour

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

We study the stochastic Multi-Armed Bandit (MAB) problem with random delays in the feedback received by the algorithm. We consider two settings: the reward-dependent delay setting, where realized delays may depend on the stochastic rewards, and the reward-independent delay setting. Our main contribution is algorithms that achieve near-optimal regret in each of the settings, with an additional additive dependence on the quantiles of the delay distribution. Our results do not make any assumptions on the delay distributions: in particular, we do not assume they come from any parametric family of distributions and allow for unbounded support and expectation; we further allow for infinite delays where the algorithm might occasionally not observe any feedback.

Original languageEnglish
Title of host publicationProceedings of the 38th International Conference on Machine Learning, ICML 2021
PublisherML Research Press
Pages5969-5978
Number of pages10
ISBN (Electronic)9781713845065
StatePublished - 2021
Event38th International Conference on Machine Learning, ICML 2021 - Virtual, Online
Duration: 18 Jul 202124 Jul 2021

Publication series

NameProceedings of Machine Learning Research
Volume139

Conference

Conference38th International Conference on Machine Learning, ICML 2021
CityVirtual, Online
Period18/07/2124/07/21

All Science Journal Classification (ASJC) codes

  • Artificial Intelligence
  • Software
  • Control and Systems Engineering
  • Statistics and Probability

Fingerprint

Dive into the research topics of 'Stochastic Multi-Armed Bandits with Unrestricted Delay Distributions'. Together they form a unique fingerprint.

Cite this