Sequential Decision Making with Coherent Risk

Aviv Tamar, Yinlam Chow, Mohammad Ghavamzadeh, Shie Mannor

Research output: Contribution to journalArticlepeer-review

Abstract

We provide sampling-based algorithms for optimization under a coherent-risk objective. The class of coherent-risk measures is widely accepted in finance and operations research, among other fields, and encompasses popular risk-measures such as conditional value at risk and mean-semi-deviation. Our approach is suitable for problems in which tuneable parameters control the distribution of the cost, such as in reinforcement learning or approximate dynamic programming with a parameterized policy. Such problems cannot be solved using previous approaches. We consider both static risk measures and time-consistent dynamic risk measures. For static risk measures, our approach is in the spirit of policy gradient methods, while for the dynamic risk measures, we use actor-critic type algorithms.

Original languageEnglish
Article number7797146
Pages (from-to)3323-3338
Number of pages16
JournalIEEE Transactions on Automatic Control
Volume62
Issue number7
DOIs
StatePublished - Jul 2017

Keywords

  • Coherent risk
  • Markov decision processes
  • dynamic programming
  • policy gradient

All Science Journal Classification (ASJC) codes

  • Control and Systems Engineering
  • Computer Science Applications
  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'Sequential Decision Making with Coherent Risk'. Together they form a unique fingerprint.

Cite this