Abstract
Recently, global convergence has been achieved for non-robust MDPs with an
iteration complexity of O(
1
ϵ
) for finding an ϵ-optimal policy, for which PL condition derived from performance difference lemma has played a key role. This work
extends performance difference lemma to s-rectangular robust MDPs from which
PL condition can be derived. We further, present a simplified proof for the policy
gradient convergence for non-robust case, which together with robust performance
difference lemma, can lead to global convergence of robust policy gradient.
iteration complexity of O(
1
ϵ
) for finding an ϵ-optimal policy, for which PL condition derived from performance difference lemma has played a key role. This work
extends performance difference lemma to s-rectangular robust MDPs from which
PL condition can be derived. We further, present a simplified proof for the policy
gradient convergence for non-robust case, which together with robust performance
difference lemma, can lead to global convergence of robust policy gradient.
Original language | Undefined/Unknown |
---|---|
Title of host publication | Sixteenth European Workshop on Reinforcement Learning |
Number of pages | 13 |
State | Published - 2023 |
Event | Sixteenth European Workshop on Reinforcement Learning - Brussels, Belgium Duration: 14 Sep 2023 → 14 Sep 2023 https://openreview.net/group?id=EWRL/2023/Workshop |
Conference
Conference | Sixteenth European Workshop on Reinforcement Learning |
---|---|
Abbreviated title | EWRL16 |
Country/Territory | Belgium |
City | Brussels |
Period | 14/09/23 → 14/09/23 |
Internet address |