Convergence of Online and Approximate Multiple-Step Lookahead Policy Iteration

Yonathan Efroni, Gal Dalal, Bruno Scherrer, Shie Mannor

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Anderson (1965) acceleration is an old and simple method for accelerating the computation of a fixed point. However, as far as we know and quite surprisingly, it has never been applied to dynamic programming or reinforcement learning. In this paper, we explain briefly what Anderson acceleration is and how it can be applied to value iteration, this being supported by preliminary experiments showing a significant speed up of convergence, that we critically discuss. We also discuss how this idea could be applied more generally to (deep) reinforcement learning.
Original languageEnglish
Title of host publicationEWRL 2018-14th European workshop on Reinforcement Learning
Number of pages26
StatePublished - 2018
Event14th European Workshop on Reinforcement Learning - Lille, France
Duration: 1 Oct 20183 Oct 2018
Conference number: 14
https://ewrl.wordpress.com/past-ewrl/ewrl14-2018/

Conference

Conference14th European Workshop on Reinforcement Learning
Abbreviated titleEWRL
Country/TerritoryFrance
CityLille
Period1/10/183/10/18
Internet address

Fingerprint

Dive into the research topics of 'Convergence of Online and Approximate Multiple-Step Lookahead Policy Iteration'. Together they form a unique fingerprint.

Cite this