Integrating partial model knowledge in model free RL algorithms

Aviv Tamar, Dotan Di Castro, Ron Meir

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

In reinforcement learning an agent uses online feedback from the environment and prior knowledge in order to adaptively select an effective policy. Model free approaches address this task by directly mapping external and internal states to actions, while model based methods attempt to construct a model of the environment, followed by a selection of optimal actions based on that model. Given the complementary advantages of both approaches, we suggest a novel algorithm which combines them into a single algorithm, which switches between a model based and a model free mode, depending on the current environmental state and on the status of the agent's knowledge. We prove that such an approach leads to improved performance whenever environmental knowledge is available, without compromising performance when such knowledge is absent. Numerical simulations demonstrate the effectiveness of the approach and suggest its efficacy in boosting policy gradient learning.

Original languageEnglish
Title of host publicationProceedings of the 28th International Conference on Machine Learning, ICML 2011
Pages305-312
Number of pages8
ISBN (Electronic)978-1-4503-0619-5
StatePublished - 2011
Event28th International Conference on Machine Learning, ICML 2011 - Bellevue, WA, United States
Duration: 28 Jun 20112 Jul 2011

Publication series

NameProceedings of the 28th International Conference on Machine Learning, ICML 2011

Conference

Conference28th International Conference on Machine Learning, ICML 2011
Country/TerritoryUnited States
CityBellevue, WA
Period28/06/112/07/11

All Science Journal Classification (ASJC) codes

  • Computer Science Applications
  • Human-Computer Interaction
  • Education

Fingerprint

Dive into the research topics of 'Integrating partial model knowledge in model free RL algorithms'. Together they form a unique fingerprint.

Cite this