Abstract
We provide sampling-based algorithms for optimization under a coherent-risk objective. The class of coherent-risk measures is widely accepted in finance and operations research, among other fields, and encompasses popular risk-measures such as conditional value at risk and mean-semi-deviation. Our approach is suitable for problems in which tuneable parameters control the distribution of the cost, such as in reinforcement learning or approximate dynamic programming with a parameterized policy. Such problems cannot be solved using previous approaches. We consider both static risk measures and time-consistent dynamic risk measures. For static risk measures, our approach is in the spirit of policy gradient methods, while for the dynamic risk measures, we use actor-critic type algorithms.
| Original language | English |
|---|---|
| Article number | 7797146 |
| Pages (from-to) | 3323-3338 |
| Number of pages | 16 |
| Journal | IEEE Transactions on Automatic Control |
| Volume | 62 |
| Issue number | 7 |
| DOIs | |
| State | Published - Jul 2017 |
Keywords
- Coherent risk
- Markov decision processes
- dynamic programming
- policy gradient
All Science Journal Classification (ASJC) codes
- Control and Systems Engineering
- Computer Science Applications
- Electrical and Electronic Engineering