Latent Geodesics of Model Dynamics for Offline Reinforcement Learning

Guy Tennenholtz, Nir Baram, Shie Mannor

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Model-based offline reinforcement learning approaches generally rely on bounds of model error. While contemporary methods achieve such bounds through an ensemble of models, we propose to estimate them using a data-driven latent metric. Particularly, we build upon recent advances in Riemannian geometry of generative models to construct a latent metric of an encoder-decoder based forward model. Our proposed metric measures both the quality of out of distribution samples as well as the discrepancy of examples in the data. We show that our metric can be viewed as a combination of two metrics, one relating to proximity and the other to epistemic uncertainty. Finally, we leverage our metric in a pessimistic model-based framework, showing a significant improvement upon contemporary model-based offline reinforcement learning benchmarks.
Original languageUndefined/Unknown
Title of host publicationDeep RL Workshop NeurIPS 2021
Number of pages21
StatePublished - 2021
EventDeep RL Workshop NeurIPS -
Duration: 13 Dec 202113 Dec 2021
https://sites.google.com/view/deep-rl-workshop-neurips2021

Conference

ConferenceDeep RL Workshop NeurIPS
Period13/12/2113/12/21
Internet address

Cite this