Robust Value Iteration for Continuous Control Tasks

Michael Lutter, Shie Mannor, Jan Peters, Dieter Fox, Animesh Garg

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

When transferring a control policy from simulation to a physical system, the policy needs to be robust to variations in the dynamics to perform well. Commonly, the optimal policy overfits to the approximate model and the corresponding statedistribution, often resulting in failure to trasnfer underlying distributional shifts. In this paper, we present Robust Fitted Value Iteration, which uses dynamic programming to compute the optimal value function on the compact state domain and incorporates adversarial perturbations of the system dynamics. The adversarial perturbations encourage a optimal policy that is robust to changes in the dynamics. Utilizing the continuoustime perspective of reinforcement learning, we derive the optimal perturbations for the states, actions, observations and model parameters in closed-form. Notably, the resulting algorithm does not require discretization of states or actions. Therefore, the optimal adversarial perturbations can be efficiently incorporated in the min-max value function update. We apply the resulting algorithm to the physical Furuta pendulum and cartpole. By changing the masses of the systems we evaluate the quantitative and qualitative performance across different model parameters. We show that robust value iteration is more robust compared to deep reinforcement learning algorithm and the non-robust version of the algorithm. Videos of the experiments are shown at https://sites.google.com/view/rfvi

Original languageEnglish
Title of host publicationRobotics
Subtitle of host publicationScience and Systems XVII
EditorsDylan A. Shell, Marc Toussaint, M. Ani Hsieh
PublisherMIT Press Journals
ISBN (Print)9780992374778
DOIs
StatePublished - 2021
Event17th Robotics: Science and Systems, RSS 2021 - Virtual, Online
Duration: 12 Jul 202116 Jul 2021

Publication series

NameRobotics: Science and Systems

Conference

Conference17th Robotics: Science and Systems, RSS 2021
CityVirtual, Online
Period12/07/2116/07/21

All Science Journal Classification (ASJC) codes

  • Artificial Intelligence
  • Electrical and Electronic Engineering
  • Control and Systems Engineering

Fingerprint

Dive into the research topics of 'Robust Value Iteration for Continuous Control Tasks'. Together they form a unique fingerprint.

Cite this