TY - GEN
T1 - Reinforcement Learning n RDPs by Combining Deep RL with Automata Learning
AU - Shahar, Tal
AU - Brafman, Ronen I.
N1 - Publisher Copyright: © 2023 The Authors.
PY - 2023/9/28
Y1 - 2023/9/28
N2 - Regular Decision Processes (RDPs) are a recently introduced model for decision-making in non-Markovian domains in which states are not postulated a-priori, and the next observation depends in a regular manner on past history. As such, they provide a more succinct and understandable model of the dynamics and reward function. Existing algorithms for learning RDPs attempt to learn an automaton that reflects the regularity of the underlying domain. However, their scalability is limited due to the practical difficulty of learning automata. In this paper we propose to leverage the power of Deep reinforcement learning in partially observable domain to learn RDPs: First, we learn an RNN-based policy. Then, we generate an automaton that reflects the policy's structure and use our old data to transform it into an MDP, which we solve. This results in a finite, explainable policy structure, and, as our empirical evaluation on old and new RDP benchmarks shows, much better sample complexity.
AB - Regular Decision Processes (RDPs) are a recently introduced model for decision-making in non-Markovian domains in which states are not postulated a-priori, and the next observation depends in a regular manner on past history. As such, they provide a more succinct and understandable model of the dynamics and reward function. Existing algorithms for learning RDPs attempt to learn an automaton that reflects the regularity of the underlying domain. However, their scalability is limited due to the practical difficulty of learning automata. In this paper we propose to leverage the power of Deep reinforcement learning in partially observable domain to learn RDPs: First, we learn an RNN-based policy. Then, we generate an automaton that reflects the policy's structure and use our old data to transform it into an MDP, which we solve. This results in a finite, explainable policy structure, and, as our empirical evaluation on old and new RDP benchmarks shows, much better sample complexity.
UR - http://www.scopus.com/inward/record.url?scp=85175786773&partnerID=8YFLogxK
U2 - https://doi.org/10.3233/FAIA230504
DO - https://doi.org/10.3233/FAIA230504
M3 - Conference contribution
T3 - Frontiers in Artificial Intelligence and Applications
SP - 2097
EP - 2104
BT - ECAI 2023 - 26th European Conference on Artificial Intelligence, including 12th Conference on Prestigious Applications of Intelligent Systems, PAIS 2023 - Proceedings
A2 - Gal, Kobi
A2 - Nowe, Ann
A2 - Nalepa, Grzegorz J.
A2 - Fairstein, Roy
A2 - Radulescu, Roxana
PB - IOS Press BV
T2 - 26th European Conference on Artificial Intelligence, ECAI 2023
Y2 - 30 September 2023 through 4 October 2023
ER -