A Bayesian approach to robust reinforcement learning

Esther Derman, Daniel Mankowitz, Timothy Mann, Shie Mannor

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Robust Markov Decision Processes (RMDPs) intend to ensure robustness with respect to changing or adversarial system behavior. In this framework, transitions are modeled as arbitrary elements of a known and properly structured uncertainty set and a robust optimal policy can be derived under the worst-case scenario. In this study, we address the issue of learning in RMDPs using a Bayesian approach. We introduce the Uncertainty Robust Bellman Equation (URBE) which encourages safe exploration for adapting the uncertainty set to new observations while preserving robustness. We propose a URBE-based algorithm, DQN-URBE, that scales this method to higher dimensional domains. Our experiments show that the derived URBE-based strategy leads to a better trade-off between less conservative solutions and robustness in the presence of model misspecification. In addition, we show that the DQN-URBE algorithm can adapt significantly faster to changing dynamics online compared to existing robust techniques with fixed uncertainty sets.

Original languageEnglish
Title of host publication35th Conference on Uncertainty in Artificial Intelligence, UAI 2019
StatePublished - 2019
Event35th Conference on Uncertainty in Artificial Intelligence, UAI 2019 - Tel Aviv, Israel
Duration: 22 Jul 201925 Jul 2019

Conference

Conference35th Conference on Uncertainty in Artificial Intelligence, UAI 2019
Country/TerritoryIsrael
CityTel Aviv
Period22/07/1925/07/19

All Science Journal Classification (ASJC) codes

  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'A Bayesian approach to robust reinforcement learning'. Together they form a unique fingerprint.

Cite this