Reinforcement Learning n RDPs by Combining Deep RL with Automata Learning

Tal Shahar, Ronen I. Brafman

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Regular Decision Processes (RDPs) are a recently introduced model for decision-making in non-Markovian domains in which states are not postulated a-priori, and the next observation depends in a regular manner on past history. As such, they provide a more succinct and understandable model of the dynamics and reward function. Existing algorithms for learning RDPs attempt to learn an automaton that reflects the regularity of the underlying domain. However, their scalability is limited due to the practical difficulty of learning automata. In this paper we propose to leverage the power of Deep reinforcement learning in partially observable domain to learn RDPs: First, we learn an RNN-based policy. Then, we generate an automaton that reflects the policy's structure and use our old data to transform it into an MDP, which we solve. This results in a finite, explainable policy structure, and, as our empirical evaluation on old and new RDP benchmarks shows, much better sample complexity.

Original languageAmerican English
Title of host publicationECAI 2023 - 26th European Conference on Artificial Intelligence, including 12th Conference on Prestigious Applications of Intelligent Systems, PAIS 2023 - Proceedings
EditorsKobi Gal, Ann Nowe, Grzegorz J. Nalepa, Roy Fairstein, Roxana Radulescu
PublisherIOS Press BV
Pages2097-2104
Number of pages8
ISBN (Electronic)9781643684369
DOIs
StatePublished - 28 Sep 2023
Event26th European Conference on Artificial Intelligence, ECAI 2023 - Krakow, Poland
Duration: 30 Sep 20234 Oct 2023

Publication series

NameFrontiers in Artificial Intelligence and Applications
Volume372

Conference

Conference26th European Conference on Artificial Intelligence, ECAI 2023
Country/TerritoryPoland
CityKrakow
Period30/09/234/10/23

All Science Journal Classification (ASJC) codes

  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'Reinforcement Learning n RDPs by Combining Deep RL with Automata Learning'. Together they form a unique fingerprint.

Cite this