Abstract
Robust Markov Decision Processes (RMDPs) intend to ensure robustness with respect to changing or adversarial system behavior. In this framework, transitions are modeled as arbitrary elements of a known and properly structured uncertainty set and a robust optimal policy can be derived under the worst-case scenario. In this study, we address the issue of learning in RMDPs using a Bayesian approach. We introduce the Uncertainty Robust Bellman Equation (URBE) which encourages safe exploration for adapting the uncertainty set to new observations while preserving robustness. We propose a URBE-based algorithm, DQN-URBE, that scales this method to higher dimensional domains. Our experiments show that the derived URBE-based strategy leads to a better trade-off between less conservative solutions and robustness in the presence of model misspecification. In addition, we show that the DQN-URBE algorithm can adapt significantly faster to changing dynamics online compared to existing robust techniques with fixed uncertainty sets.
Original language | English |
---|---|
Title of host publication | 35th Conference on Uncertainty in Artificial Intelligence, UAI 2019 |
State | Published - 2019 |
Event | 35th Conference on Uncertainty in Artificial Intelligence, UAI 2019 - Tel Aviv, Israel Duration: 22 Jul 2019 → 25 Jul 2019 |
Conference
Conference | 35th Conference on Uncertainty in Artificial Intelligence, UAI 2019 |
---|---|
Country/Territory | Israel |
City | Tel Aviv |
Period | 22/07/19 → 25/07/19 |
All Science Journal Classification (ASJC) codes
- Artificial Intelligence