Reinforcement Learning with LTL and ω-Regular Objectives via Optimality-Preserving Translation to Average Rewards

Xuan Bach Le, Dominik Wagner, Leon Witzman, Alexander Rabinovich, Luke Ong

Research output: Contribution to journalConference articlepeer-review

Abstract

Linear temporal logic (LTL) and, more generally, ω-regular objectives are alternatives to the traditional discount sum and average reward objectives in reinforcement learning (RL), offering the advantage of greater comprehensibility and hence ex-plainability. In this work, we study the relationship between these objectives. Our main result is that each RL problem for ω-regular objectives can be reduced to a limit-average reward problem in an optimality-preserving fashion, via (finite-memory) reward machines. Furthermore, we demonstrate the efficacy of this approach by showing that optimal policies for limit-average problems can be found asymptotically by solving a sequence of discount-sum problems approximately. Consequently, we resolve an open problem: optimal policies for LTL and ω-regular objectives can be learned asymptotically.

Original languageEnglish
JournalAdvances in Neural Information Processing Systems
Volume37
StatePublished - 2024
Event38th Conference on Neural Information Processing Systems, NeurIPS 2024 - Vancouver, Canada
Duration: 9 Dec 202415 Dec 2024

All Science Journal Classification (ASJC) codes

  • Computer Networks and Communications
  • Information Systems
  • Signal Processing

Fingerprint

Dive into the research topics of 'Reinforcement Learning with LTL and ω-Regular Objectives via Optimality-Preserving Translation to Average Rewards'. Together they form a unique fingerprint.

Cite this