PAC bandits with risk constraints

Yahel David, Balázs Szörényi, Mohammad Ghavamzadeh, Shie Mannor, Nahum Shimkin

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

We study the problem of best arm identification with risk constraints within the setting of fixed confidence pure exploration bandits (PAC bandits). The goal is to stop as fast as possible, and with high confidence return an arm whose mean is -close to the best arm among those that satisfy a risk constraint, namely their α-quantile functions are larger than a threshold β. For this risk-sensitive bandit problem, we propose an algorithm and prove an upper-bound on its sample complexity for the general case of sub-Gaussian arms’ distributions. We also prove a lower-bound for this general case that shows our derived upper-bound is near-optimal (up to logarithmic factors). Both our upper and lower bounds have similar form to the risk-neutral PAC bandits results of (Even-Dar et al. 2006) and (Mannor and Tsitsiklis 2004), respectively. We also prove a lower-bound for our problem when the arms’ distributions are Gaussian, which is smaller than our general lower-bound, but is stronger in the sense that it applies to any instance of the (Gaussian) problem. This lower-bound is in terms of the KL divergence and has similar behavior to the risk-neutral PAC bandits results of (Kaufmann et al. 2016).

Original languageEnglish
Title of host publication2018 International Symposium on Artificial Intelligence and Mathematics, ISAIM 2018
Number of pages9
StatePublished - 2018
Event2018 International Symposium on Artificial Intelligence and Mathematics, ISAIM 2018 - Fort Lauderdale, United States
Duration: 3 Jan 20185 Jan 2018

Conference

Conference2018 International Symposium on Artificial Intelligence and Mathematics, ISAIM 2018
Country/TerritoryUnited States
CityFort Lauderdale
Period3/01/185/01/18

All Science Journal Classification (ASJC) codes

  • Applied Mathematics
  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'PAC bandits with risk constraints'. Together they form a unique fingerprint.

Cite this