Skip to main navigation Skip to search Skip to main content

LTL F /LDL F non-markovian rewards

Ronen I. Brafman, Giuseppe De Giacomo, Fabio Patrizi

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

In Markov Decision Processes (MDPs), the reward obtained in a state is Markovian, i.e., depends on the last state and action. This dependency makes it difficult to reward more interesting long-term behaviors, such as always closing a door after it has been opened, or providing coffee only following a request. Extending MDPs to handle non-Markovian reward functions was the subject of two previous lines of work. Both use LTL variants to specify the reward function and then compile the new model back into a Markovian model. Building on recent progress in temporal logics over finite traces, we adopt LDL f for specifying non-Markovian rewards and provide an elegant automata construction for building a Markovian model, which extends that of previous work and offers strong minimality and compositionality guarantees.

Original languageAmerican English
Title of host publication32nd AAAI Conference on Artificial Intelligence, AAAI 2018
Pages1771-1778
Number of pages8
ISBN (Electronic)9781577358008
StatePublished - 1 Jan 2018
Event32nd AAAI Conference on Artificial Intelligence, AAAI 2018 - New Orleans, United States
Duration: 2 Feb 20187 Feb 2018

Publication series

Name32nd AAAI Conference on Artificial Intelligence, AAAI 2018

Conference

Conference32nd AAAI Conference on Artificial Intelligence, AAAI 2018
Country/TerritoryUnited States
CityNew Orleans
Period2/02/187/02/18

All Science Journal Classification (ASJC) codes

  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'LTL F /LDL F non-markovian rewards'. Together they form a unique fingerprint.

Cite this