Generalized emphatic temporal difference learning: Bias-variance analysis

Assaf Hallak, Aviv Tamar, Remi Munos, Shie Mannor

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

We consider the off-policy evaluation problem in Markov decision processes with function approximation. We propose a generalization of the recently introduced emphatic temporal differences (ETD) algorithm (Sutton, Mahmood, and White 2015), which encompasses the original ETD(λ), as well as several other off-policy evaluation algorithms as special cases. We call this framework ETD(λ, β), where our introduced parameter β controls the decay rate of an importancesampling term. We study conditions under which the projected fixed-point equation underlying ETD(λ, β) involves a contraction operator, allowing us to present the first asymptotic error bounds (bias) for ETD(λ, β). Our results show that the original ETD algorithm always involves a contraction operator, and its bias is bounded. Moreover, by controlling β, our proposed generalization allows trading-off bias for variance reduction, thereby achieving a lower total error.

Original languageEnglish
Title of host publication30th AAAI Conference on Artificial Intelligence, AAAI 2016
Pages1631-1637
Number of pages7
ISBN (Electronic)9781577357605
StatePublished - 2016
Event30th AAAI Conference on Artificial Intelligence, AAAI 2016 - Phoenix, United States
Duration: 12 Feb 201617 Feb 2016

Publication series

Name30th AAAI Conference on Artificial Intelligence, AAAI 2016

Conference

Conference30th AAAI Conference on Artificial Intelligence, AAAI 2016
Country/TerritoryUnited States
CityPhoenix
Period12/02/1617/02/16

All Science Journal Classification (ASJC) codes

  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'Generalized emphatic temporal difference learning: Bias-variance analysis'. Together they form a unique fingerprint.

Cite this