Abstract
Linear temporal logic (LTL) and, more generally, ω-regular objectives are alternatives to the traditional discount sum and average reward objectives in reinforcement learning (RL), offering the advantage of greater comprehensibility and hence ex-plainability. In this work, we study the relationship between these objectives. Our main result is that each RL problem for ω-regular objectives can be reduced to a limit-average reward problem in an optimality-preserving fashion, via (finite-memory) reward machines. Furthermore, we demonstrate the efficacy of this approach by showing that optimal policies for limit-average problems can be found asymptotically by solving a sequence of discount-sum problems approximately. Consequently, we resolve an open problem: optimal policies for LTL and ω-regular objectives can be learned asymptotically.
Original language | English |
---|---|
Journal | Advances in Neural Information Processing Systems |
Volume | 37 |
State | Published - 2024 |
Event | 38th Conference on Neural Information Processing Systems, NeurIPS 2024 - Vancouver, Canada Duration: 9 Dec 2024 → 15 Dec 2024 |
All Science Journal Classification (ASJC) codes
- Computer Networks and Communications
- Information Systems
- Signal Processing