Time-regularized interrupting options

Daniel J. Mankowitz, Timothy A. Mann, Shie Mannor

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

High-level skills relieve planning algorithms from low-level details. But when the skills are poorly designed for the domain, the resulting plan may be severely suboptimal. Sutton et al. (1999) made an important step towards resolving this problem by introducing a rule that automatically improves a set of skills called options. This rule terminates an option early whenever switching to another option gives a higher value than continuing with the current option. However, they only analyzed the case where the improvement rule is applied once. We show conditions where this rule converges to the optimal set of options. A new interrupting Bellman operator that simultaneously improves the set of options is at the core of our analysis. One problem with the update rule is that it tends to favor lower-level skills. We introduce a regularization term that favors longer duration skills. Experimental results demonstrate that this approach can derive a good set of high-level skills even when the original set of skills cannot solve the problem.

Original languageEnglish
Title of host publication31st International Conference on Machine Learning, ICML 2014
Pages1350-1358
Number of pages9
ISBN (Electronic)9781634393973
StatePublished - 2014
Event31st International Conference on Machine Learning, ICML 2014 - Beijing, China
Duration: 21 Jun 201426 Jun 2014

Publication series

Name31st International Conference on Machine Learning, ICML 2014
Volume4

Conference

Conference31st International Conference on Machine Learning, ICML 2014
Country/TerritoryChina
CityBeijing
Period21/06/1426/06/14

All Science Journal Classification (ASJC) codes

  • Artificial Intelligence
  • Computer Networks and Communications
  • Software

Fingerprint

Dive into the research topics of 'Time-regularized interrupting options'. Together they form a unique fingerprint.

Cite this