Bandit Regret Scaling with the Effective Loss Range

Nicolò Cesa-Bianchi, Ohad Shamir

Research output: Contribution to journalConference articlepeer-review

Abstract

We study how the regret guarantees of nonstochastic multi-armed bandits can be improved, if the effective range of the losses in each round is small (for example, the maximal difference between two losses in a given round). Despite a recent impossibility result, we show how this can be made possible under certain mild additional assumptions, such as availability of rough estimates of the losses, or knowledge of the loss of a single, possibly unspecified arm, at the end of each round. Along the way, we develop a novel technique which might be of independent interest, to convert any multi-armed bandit algorithm with regret depending on the loss range, to an algorithm with regret depending only on the effective range, while attaining better regret bounds than existing approaches.

Original languageEnglish
Pages (from-to)128-151
Number of pages24
JournalProceedings of Machine Learning Research
Volume83
StatePublished - 2018
Event29th International Conference on Algorithmic Learning Theory, ALT 2018 - Lanzarote, Spain
Duration: 7 Apr 20189 Apr 2018

All Science Journal Classification (ASJC) codes

  • Artificial Intelligence
  • Software
  • Control and Systems Engineering
  • Statistics and Probability

Cite this