A General Approach to Multi-Armed Bandits Under Risk Criteria

Asaf Cassel, Shie Mannor, Assaf Zeevi

Research output: Contribution to journalConference articlepeer-review

Abstract

Different risk-related criteria have received recent interest in learning problems, where typically each case is treated in a customized manner. In this paper we provide a more systematic approach to analyzing such risk criteria within a stochastic multi-armed bandit (MAB) formulation. We identify a set of general conditions that yield a simple characterization of the oracle rule (which serves as the regret benchmark), and facilitate the design of upper confidence bound (UCB) learning policies. The conditions are derived from problem primitives, primarily focusing on the relation between the arm reward distributions and the (risk criteria) performance metric. Among other things, the work highlights some (possibly non-intuitive) subtleties that differentiate various criteria in conjunction with statistical properties of the arms. Our main findings are illustrated on several widely used objectives such as conditional value-at-risk, mean-variance, Sharpe-ratio, and more.

Original languageEnglish
Pages (from-to)1295-1306
Number of pages12
JournalProceedings of Machine Learning Research
Volume75
StatePublished - 2018
Event31st Annual Conference on Learning Theory, COLT 2018 - Stockholm, Sweden
Duration: 6 Jul 20189 Jul 2018

Keywords

  • Multi-Armed Bandit
  • Upper Confidence Bound
  • planning
  • reinforcement learning
  • risk

All Science Journal Classification (ASJC) codes

  • Software
  • Artificial Intelligence
  • Control and Systems Engineering
  • Statistics and Probability

Fingerprint

Dive into the research topics of 'A General Approach to Multi-Armed Bandits Under Risk Criteria'. Together they form a unique fingerprint.

Cite this