Stabilizing Off-Policy Reinforcement Learning with Conservative Policy Gradients

Chen Tessler, Nadav Merlis, Shie Mannor

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

In recent years, advances in deep learning have enabled the application of reinforcement learning algorithms in complex domains. However, they lack the theoretical guarantees which are present in the tabular setting and suffer from many stability and reproducibility problems \citep{henderson2018deep}. In this work, we suggest a simple approach for improving stability and providing probabilistic performance guarantees in off-policy actor-critic deep reinforcement learning regimes. Experiments on continuous action spaces, in the MuJoCo control suite, show that our proposed method reduces the variance of the process and improves the overall performance.
Original languageEnglish
Title of host publicationEighth International Conference on Learning Representations
Number of pages16
StatePublished - 2020
Event8th International Conference on Learning Representations, ICLR 2020 - Addis Ababa, Ethiopia
Duration: 30 Apr 2020 → …

Conference

Conference8th International Conference on Learning Representations, ICLR 2020
Country/TerritoryEthiopia
CityAddis Ababa
Period30/04/20 → …

Fingerprint

Dive into the research topics of 'Stabilizing Off-Policy Reinforcement Learning with Conservative Policy Gradients'. Together they form a unique fingerprint.

Cite this