Temporal difference methods for the variance of the reward to go

Aviv Tamar, Dotan Di Castro, Shie Mannor

Research output: Contribution to journalConference articlepeer-review

Abstract

In this paper we extend temporal difference policy evaluation algorithms to performance criteria that include the variance of the cumulative reward. Such criteria are useful for risk management, and are important in domains such as finance and process control. We propose variants of both TD(0) and LSTD(λ) with linear function approximation, prove their convergence, and demonstrate their utility in a 4-dimensional continuous state space problem.

Original languageEnglish
Pages (from-to)1532-1540
Number of pages9
JournalProceedings of Machine Learning Research
Volume28
StatePublished - 2013
Event30th International Conference on Machine Learning, ICML 2013 - Atlanta, GA, United States
Duration: 16 Jun 201321 Jun 2013

All Science Journal Classification (ASJC) codes

  • Human-Computer Interaction
  • Sociology and Political Science

Fingerprint

Dive into the research topics of 'Temporal difference methods for the variance of the reward to go'. Together they form a unique fingerprint.

Cite this