Multiple-step greedy policies in online and approximate reinforcement learning

Yonathan Efroni, Bruno Scherrer, Gal Dalal, Shie Mannor

Research output: Contribution to journalConference articlepeer-review

Abstract

Multiple-step lookahead policies have demonstrated high empirical competence in Reinforcement Learning, via the use of Monte Carlo Tree Search or Model Predictive Control. In a recent work [5], multiple-step greedy policies and their use in vanilla Policy Iteration algorithms were proposed and analyzed. In this work, we study multiple-step greedy algorithms in more practical setups. We begin by highlighting a counter-intuitive difficulty, arising with soft-policy updates: even in the absence of approximations, and contrary to the 1-step-greedy case, monotonic policy improvement is not guaranteed unless the update stepsize is sufficiently large. Taking particular care about this difficulty, we formulate and analyze online and approximate algorithms that use such a multi-step greedy operator.

Original languageEnglish
Pages (from-to)5238-5247
Number of pages10
JournalAdvances in Neural Information Processing Systems
Volume2018-December
StatePublished - 2018
Event32nd Conference on Neural Information Processing Systems, NeurIPS 2018 - Montreal, Canada
Duration: 2 Dec 20188 Dec 2018

All Science Journal Classification (ASJC) codes

  • Computer Networks and Communications
  • Information Systems
  • Signal Processing

Fingerprint

Dive into the research topics of 'Multiple-step greedy policies in online and approximate reinforcement learning'. Together they form a unique fingerprint.

Cite this