Abstract
2014 We show how options, a class of control structures encompassing primitive and temporally extended actions, can play a valuable role in planmng in MDPs with continuous state-spaces. Analyzing the convergence rate of Approximate Value Iteration with options reveals that for pessimistic initial value function estimates, options can speed up convergence compared to plan- fling with only primitive actions even when the temporally extended actions are suboptimal and sparsely scattered throughout the state-space. Our experimental results in an optimal replacement task and a complex inventory management task demonstrate the potential for options to speed up convergence in practice. We show that options induce faster convergence to the optimal value function, which implies deriving better policies with fewer iterations.
| Original language | English |
|---|---|
| Pages (from-to) | 228-250 |
| Number of pages | 23 |
| Journal | 31st International Conference on Machine Learning, ICML 2014 |
| Volume | 32 |
| State | Published - 2014 |
| Event | 31st International Conference on Machine Learning, ICML 2014 - Beijing, China Duration: 21 Jun 2014 → 26 Jun 2014 |
All Science Journal Classification (ASJC) codes
- Artificial Intelligence
- Computer Networks and Communications
- Software