Finite Sample Analysis of Two-Timescale Stochastic Approximation with Applications to Reinforcement Learning

Gal Dalal, Balázs Szörényi, Gugan Thoppe, Shie Mannor

Research output: Contribution to journalConference articlepeer-review

Abstract

Two-timescale Stochastic Approximation (SA) algorithms are widely used in Reinforcement Learning (RL). Their iterates have two parts that are updated using distinct stepsizes. In this work, we develop a novel recipe for their finite sample analysis. Using this, we provide a concentration bound, which is the first such result for a two-timescale SA. The type of bound we obtain is known as “lock-in probability”. We also introduce a new projection scheme, in which the time between successive projections increases exponentially. This scheme allows one to elegantly transform a lock-in probability into a convergence rate result for projected two-timescale SA. From this latter result, we then extract key insights on stepsize selection. As an application, we finally obtain convergence rates for the projected two-timescale RL algorithms GTD(0), GTD2, and TDC.

Original languageEnglish
Pages (from-to)1199-1233
Number of pages35
JournalProceedings of Machine Learning Research
Volume75
StatePublished - 2018
Event31st Annual Conference on Learning Theory, COLT 2018 - Stockholm, Sweden
Duration: 6 Jul 20189 Jul 2018

All Science Journal Classification (ASJC) codes

  • Software
  • Artificial Intelligence
  • Control and Systems Engineering
  • Statistics and Probability

Fingerprint

Dive into the research topics of 'Finite Sample Analysis of Two-Timescale Stochastic Approximation with Applications to Reinforcement Learning'. Together they form a unique fingerprint.

Cite this