Abstract
In Reinforcement Learning (RL), state-of-the-art algorithms require a large number of samples per state-action pair to estimate the transition kernel p. In many problems, a good approximation of p is not needed. For instance, if from one state-action pair (s, a), one can only transit to states with the same value, learning p(·|s, a) accurately is irrelevant (only its support matters). This paper aims at capturing such behavior by defining a novel hardness measure for Markov Decision Processes (MDPs) based on what we call the distribution-norm. The distribution-norm w.r.t. a measure ν is defined on zero ν-mean functions f by the standard variation of f with respect to ν. We first provide a concentration inequality for the dual of the distribution-norm. This allows us to replace the problem-free, loose ∥ · ∥1 concentration inequalities used in most previous analysis of RL algorithms, with a tighter problem-dependent hardness measure. We then show that several common RL benchmarks have low hardness when measured using the new norm. The distribution-norm captures finer properties than the number of states or the diameter and can be used to assess the difficulty of MDPs.
| Original language | English |
|---|---|
| Pages (from-to) | 1835-1843 |
| Number of pages | 9 |
| Journal | Advances in Neural Information Processing Systems |
| Volume | 2 |
| Issue number | January |
| State | Published - 2014 |
| Event | 28th Annual Conference on Neural Information Processing Systems 2014, NIPS 2014 - Montreal, Canada Duration: 8 Dec 2014 → 13 Dec 2014 |
All Science Journal Classification (ASJC) codes
- Computer Networks and Communications
- Information Systems
- Signal Processing
Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver