Abstract
We consider the Inverse Reinforcement Learning problem in Contextual Markov
Decision Processes. In this setting, the reward, which is unknown to the agent, is a
function of a static parameter referred to as the context. There is also an “expert”
who knows this mapping and acts according to the optimal policy for each context.
The goal of the agent is to learn the expert’s mapping by observing demonstrations.
We define an optimization problem for finding this mapping and show that when
it is linear, the problem is convex. We present and analyze the sample complexity
of three algorithms for solving this problem: the mirrored descent algorithm,
evolution strategies, and the ellipsoid method. We also extend the first two methods
to work with general reward functions, e.g., deep neural networks, but without the
theoretical guarantees. Finally, we compare the different techniques empirically in
driving simulation and a medical treatment regime.
Decision Processes. In this setting, the reward, which is unknown to the agent, is a
function of a static parameter referred to as the context. There is also an “expert”
who knows this mapping and acts according to the optimal policy for each context.
The goal of the agent is to learn the expert’s mapping by observing demonstrations.
We define an optimization problem for finding this mapping and show that when
it is linear, the problem is convex. We present and analyze the sample complexity
of three algorithms for solving this problem: the mirrored descent algorithm,
evolution strategies, and the ellipsoid method. We also extend the first two methods
to work with general reward functions, e.g., deep neural networks, but without the
theoretical guarantees. Finally, we compare the different techniques empirically in
driving simulation and a medical treatment regime.
Original language | Undefined/Unknown |
---|---|
Title of host publication | Eighth International Conference on Learning Representations |
Number of pages | 23 |
State | Published - 2020 |
Event | 8th International Conference on Learning Representations, ICLR 2020 - Addis Ababa, Ethiopia Duration: 30 Apr 2020 → … |
Conference
Conference | 8th International Conference on Learning Representations, ICLR 2020 |
---|---|
Country/Territory | Ethiopia |
City | Addis Ababa |
Period | 30/04/20 → … |