Towards Faster Global Convergence of Robust Policy Gradient Methods

Navdeep Kumar, Ilnura Usmanova, Kfir Yehuda Levy, Shie Mannor

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Recently, global convergence has been achieved for non-robust MDPs with an
iteration complexity of O(
1
ϵ
) for finding an ϵ-optimal policy, for which PL condition derived from performance difference lemma has played a key role. This work
extends performance difference lemma to s-rectangular robust MDPs from which
PL condition can be derived. We further, present a simplified proof for the policy
gradient convergence for non-robust case, which together with robust performance
difference lemma, can lead to global convergence of robust policy gradient.
Original languageUndefined/Unknown
Title of host publicationSixteenth European Workshop on Reinforcement Learning
Number of pages13
StatePublished - 2023
EventSixteenth European Workshop on Reinforcement Learning
- Brussels, Belgium
Duration: 14 Sep 202314 Sep 2023
https://openreview.net/group?id=EWRL/2023/Workshop

Conference

ConferenceSixteenth European Workshop on Reinforcement Learning
Abbreviated titleEWRL16
Country/TerritoryBelgium
CityBrussels
Period14/09/2314/09/23
Internet address

Cite this