TY - GEN
T1 - Beyond the one-step greedy approach in reinforcement learning
AU - Efroni, Yonathan
AU - Dalai, Gal
AU - Scherrer, Bruno
AU - Mannor, Shie
N1 - Publisher Copyright: © 2018 35th International Conference on Machine Learning, ICML 2018. All rights reserved.
PY - 2018
Y1 - 2018
N2 - The famous Policy Iteration algorithm alternates between policy improvement and policy evaluation. Implementations of this algorithm with several variants of the latter evaluation stage, e.g, n-step and trace-based returns, have been analyzed in previous works. However, the case of multiple-step lookahead policy improvement, despite the recent increase in empirical evidence of its strength, has to our knowledge not been carefully analyzed yet. In this work, we introduce the first such analysis. Namely, we formulate variants of multiple-step policy improvement, derive new algorithms using these definitions and prove their convergence. Moreover, we show that recent prominent Reinforcement Learning algorithms fit well into our unified framework. We thus shed light on their empirical success and give a recipe for deriving new algorithms for future study.
AB - The famous Policy Iteration algorithm alternates between policy improvement and policy evaluation. Implementations of this algorithm with several variants of the latter evaluation stage, e.g, n-step and trace-based returns, have been analyzed in previous works. However, the case of multiple-step lookahead policy improvement, despite the recent increase in empirical evidence of its strength, has to our knowledge not been carefully analyzed yet. In this work, we introduce the first such analysis. Namely, we formulate variants of multiple-step policy improvement, derive new algorithms using these definitions and prove their convergence. Moreover, we show that recent prominent Reinforcement Learning algorithms fit well into our unified framework. We thus shed light on their empirical success and give a recipe for deriving new algorithms for future study.
UR - http://www.scopus.com/inward/record.url?scp=85057338601&partnerID=8YFLogxK
M3 - منشور من مؤتمر
T3 - 35th International Conference on Machine Learning, ICML 2018
SP - 2239
EP - 2252
BT - 35th International Conference on Machine Learning, ICML 2018
A2 - Dy, Jennifer
A2 - Krause, Andreas
T2 - 35th International Conference on Machine Learning, ICML 2018
Y2 - 10 July 2018 through 15 July 2018
ER -