Online convex optimization in adversarial markov decision processes

Aviv Rosenberg, Yishay Mansour

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

We consider online learning in episodic loop-free Markov decision processes (MDPs), where the loss function can change arbitrarily between episodes, and the transition function is not known to the learner. We show Õ(L|X|√|A|T) regret bound, where T is the number of episodes, X is the state space, A is the action space, and L is the length of each episode. Our online algorithm is implemented using entropic regulariza-tion methodology, which allows to extend the original adversarial MDP model to handle convex performance criteria (different ways to aggregate the losses of a single episode), as well as improve previous regret bounds.

Original languageEnglish
Title of host publication36th International Conference on Machine Learning, ICML 2019
Pages9643-9651
Number of pages9
ISBN (Electronic)9781510886988
StatePublished - 2019
Event36th International Conference on Machine Learning, ICML 2019 - Long Beach, United States
Duration: 9 Jun 201915 Jun 2019

Publication series

Name36th International Conference on Machine Learning, ICML 2019
Volume2019-June

Conference

Conference36th International Conference on Machine Learning, ICML 2019
Country/TerritoryUnited States
CityLong Beach
Period9/06/1915/06/19

All Science Journal Classification (ASJC) codes

  • Education
  • Computer Science Applications
  • Human-Computer Interaction

Fingerprint

Dive into the research topics of 'Online convex optimization in adversarial markov decision processes'. Together they form a unique fingerprint.

Cite this