Abstract
We study the problem of best arm identification with risk constraints within the setting of fixed confidence pure exploration bandits (PAC bandits). The goal is to stop as fast as possible, and with high confidence return an arm whose mean is -close to the best arm among those that satisfy a risk constraint, namely their α-quantile functions are larger than a threshold β. For this risk-sensitive bandit problem, we propose an algorithm and prove an upper-bound on its sample complexity for the general case of sub-Gaussian arms’ distributions. We also prove a lower-bound for this general case that shows our derived upper-bound is near-optimal (up to logarithmic factors). Both our upper and lower bounds have similar form to the risk-neutral PAC bandits results of (Even-Dar et al. 2006) and (Mannor and Tsitsiklis 2004), respectively. We also prove a lower-bound for our problem when the arms’ distributions are Gaussian, which is smaller than our general lower-bound, but is stronger in the sense that it applies to any instance of the (Gaussian) problem. This lower-bound is in terms of the KL divergence and has similar behavior to the risk-neutral PAC bandits results of (Kaufmann et al. 2016).
| Original language | English |
|---|---|
| Title of host publication | 2018 International Symposium on Artificial Intelligence and Mathematics, ISAIM 2018 |
| Number of pages | 9 |
| State | Published - 2018 |
| Event | 2018 International Symposium on Artificial Intelligence and Mathematics, ISAIM 2018 - Fort Lauderdale, United States Duration: 3 Jan 2018 → 5 Jan 2018 |
Conference
| Conference | 2018 International Symposium on Artificial Intelligence and Mathematics, ISAIM 2018 |
|---|---|
| Country/Territory | United States |
| City | Fort Lauderdale |
| Period | 3/01/18 → 5/01/18 |
All Science Journal Classification (ASJC) codes
- Applied Mathematics
- Artificial Intelligence