TY - GEN
T1 - Online convex optimization in adversarial markov decision processes
AU - Rosenberg, Aviv
AU - Mansour, Yishay
N1 - Publisher Copyright: © 2019 International Machine Learning Society (IMLS).
PY - 2019
Y1 - 2019
N2 - We consider online learning in episodic loop-free Markov decision processes (MDPs), where the loss function can change arbitrarily between episodes, and the transition function is not known to the learner. We show Õ(L|X|√|A|T) regret bound, where T is the number of episodes, X is the state space, A is the action space, and L is the length of each episode. Our online algorithm is implemented using entropic regulariza-tion methodology, which allows to extend the original adversarial MDP model to handle convex performance criteria (different ways to aggregate the losses of a single episode), as well as improve previous regret bounds.
AB - We consider online learning in episodic loop-free Markov decision processes (MDPs), where the loss function can change arbitrarily between episodes, and the transition function is not known to the learner. We show Õ(L|X|√|A|T) regret bound, where T is the number of episodes, X is the state space, A is the action space, and L is the length of each episode. Our online algorithm is implemented using entropic regulariza-tion methodology, which allows to extend the original adversarial MDP model to handle convex performance criteria (different ways to aggregate the losses of a single episode), as well as improve previous regret bounds.
UR - http://www.scopus.com/inward/record.url?scp=85078291010&partnerID=8YFLogxK
M3 - منشور من مؤتمر
T3 - 36th International Conference on Machine Learning, ICML 2019
SP - 9643
EP - 9651
BT - 36th International Conference on Machine Learning, ICML 2019
T2 - 36th International Conference on Machine Learning, ICML 2019
Y2 - 9 June 2019 through 15 June 2019
ER -