Finite sample analyses for TD(0) with function approximation

Gal Dalal, Balázs Szörényi, Gugan Thoppe, Shie Mannor

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

TD(0) is one of the most commonly used algorithms in reinforcement learning. Despite this, there is no existing finite sample analysis for TD(0) with function approximation, even for the linear case. Our work is the first to provide such results. Works that managed to obtain convergence rates for online Temporal Difference (TD) methods analyzed somewhat modified versions of them that include projections and stepsize dependent on unknown problem parameters. Our analysis obviates these artificial alterations by exploiting strong properties of TD(0). We provide convergence rates both in expectation and with high-probability. Both are based on relatively unknown, recently developed stochastic approximation techniques.

Original languageEnglish
Title of host publication32nd AAAI Conference on Artificial Intelligence, AAAI 2018
Pages6144-6160
Number of pages17
ISBN (Electronic)9781577358008
StatePublished - 2018
Event32nd AAAI Conference on Artificial Intelligence, AAAI 2018 - New Orleans, United States
Duration: 2 Feb 20187 Feb 2018

Publication series

Name32nd AAAI Conference on Artificial Intelligence, AAAI 2018

Conference

Conference32nd AAAI Conference on Artificial Intelligence, AAAI 2018
Country/TerritoryUnited States
CityNew Orleans
Period2/02/187/02/18

All Science Journal Classification (ASJC) codes

  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'Finite sample analyses for TD(0) with function approximation'. Together they form a unique fingerprint.

Cite this