Unified inter and intra options learning using policy gradient methods

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Temporally extended actions (or macro-actions) have proven useful for speeding up planning and learning, adding robustness, and building prior knowledge into AI systems. The options framework, as introduced in Sutton, Precup and Singh (1999), provides a natural way to incorporate macro-actions into reinforcement learning. In the subgoals approach, learning is divided into two phases, first learning each option with a prescribed subgoal, and then learning to compose the learned options together. In this paper we offer a unified framework for concurrent inter- and intra-options learning. To that end, we propose a modular parameterization of intra-option policies together with option termination conditions and the option selection policy (inter options), and show that these three decision components may be viewed as a unified policy over an augmented state-action space, to which standard policy gradient algorithms may be applied. We identify the basis functions that apply to each of these decision components, and show that they possess a useful orthogonality property that allows to compute the natural gradient independently for each component. We further outline the extension of the suggested framework to several levels of options hierarchy, and conclude with a brief illustrative example.

Original languageEnglish
Title of host publicationRecent Advances in Reinforcement Learning - 9th European Workshop, EWRL 2011, Revised Selected Papers
Pages153-164
Number of pages12
DOIs
StatePublished - 2012
Event9th European Workshop on Reinforcement Learning, EWRL 2011 - Athens, Greece
Duration: 9 Sep 201111 Sep 2011

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume7188 LNAI

Conference

Conference9th European Workshop on Reinforcement Learning, EWRL 2011
Country/TerritoryGreece
CityAthens
Period9/09/1111/09/11

All Science Journal Classification (ASJC) codes

  • Theoretical Computer Science
  • General Computer Science

Fingerprint

Dive into the research topics of 'Unified inter and intra options learning using policy gradient methods'. Together they form a unique fingerprint.

Cite this