Optimizing the CVaR via sampling

Aviv Tamar, Yonatan Glassner, Shie Mannor

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Conditional Value at Risk (CVaR) is a prominent risk measure that is being used extensively in various domains. We develop a new formula for the gradient of the CVaR in the form of a conditional expectation. Based on this formula, we propose a novel sampling-based estimator for the gradient of the CVaR, in the spirit of the likelihood-ratio method. We analyze the bias of the estimator, and prove the convergence of a corresponding stochastic gradient descent algorithm to a local CVaR optimum. Our method allows to consider CVaR optimization in new domains. As an example, we consider a reinforcement learning application, and learn a risksensitive controller for the game of Tetris.

Original languageEnglish
Title of host publicationProceedings of the 29th AAAI Conference on Artificial Intelligence, AAAI 2015 and the 27th Innovative Applications of Artificial Intelligence Conference, IAAI 2015
Pages2993-2999
Number of pages7
ISBN (Electronic)9781577357025
StatePublished - 1 Jun 2015
Event29th AAAI Conference on Artificial Intelligence, AAAI 2015 and the 27th Innovative Applications of Artificial Intelligence Conference, IAAI 2015 - Austin, United States
Duration: 25 Jan 201530 Jan 2015

Publication series

NameProceedings of the National Conference on Artificial Intelligence
Volume4

Conference

Conference29th AAAI Conference on Artificial Intelligence, AAAI 2015 and the 27th Innovative Applications of Artificial Intelligence Conference, IAAI 2015
Country/TerritoryUnited States
CityAustin
Period25/01/1530/01/15

All Science Journal Classification (ASJC) codes

  • Software
  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'Optimizing the CVaR via sampling'. Together they form a unique fingerprint.

Cite this