Abstract
Model-based offline reinforcement learning approaches generally rely on bounds of model error. While contemporary methods achieve such bounds through an ensemble of models, we propose to estimate them using a data-driven latent metric. Particularly, we build upon recent advances in Riemannian geometry of generative models to construct a latent metric of an encoder-decoder based forward model. Our proposed metric measures both the quality of out of distribution samples as well as the discrepancy of examples in the data. We show that our metric can be viewed as a combination of two metrics, one relating to proximity and the other to epistemic uncertainty. Finally, we leverage our metric in a pessimistic model-based framework, showing a significant improvement upon contemporary model-based offline reinforcement learning benchmarks.
| Original language | Undefined/Unknown |
|---|---|
| Title of host publication | Deep RL Workshop NeurIPS 2021 |
| Number of pages | 21 |
| State | Published - 2021 |
| Event | Deep RL Workshop NeurIPS - Duration: 13 Dec 2021 → 13 Dec 2021 https://sites.google.com/view/deep-rl-workshop-neurips2021 |
Conference
| Conference | Deep RL Workshop NeurIPS |
|---|---|
| Period | 13/12/21 → 13/12/21 |
| Internet address |