Recomposing the Reinforcement Learning Building Blocks with Hypernetworks

Elad Sarafian, Shai Keynan, Sarit Kraus

פרסום מחקרי: פרק בספר / בדוח / בכנספרסום בספר כנסביקורת עמיתים

תקציר

The Reinforcement Learning (RL) building blocks, i.e. Q-functions and policy networks, usually take elements from the cartesian product of two domains as input. In particular, the input of the Q-function is both the state and the action, and in multi-task problems (Meta-RL) the policy can take a state and a context. Standard architectures tend to ignore these variables' underlying interpretations and simply concatenate their features into a single vector. In this work, we argue that this choice may lead to poor gradient estimation in actor-critic algorithms and high variance learning steps in Meta-RL algorithms. To consider the interaction between the input variables, we suggest using a Hypernetwork architecture where a primary network determines the weights of a conditional dynamic network. We show that this approach improves the gradient approximation and reduces the learning step variance, which both accelerates learning and improves the final performance. We demonstrate a consistent improvement across different locomotion tasks and different algorithms both in RL (TD3 and SAC) and in Meta-RL (MAML and PEARL).

שפה מקוריתאנגלית
כותר פרסום המארחProceedings of the 38th International Conference on Machine Learning, ICML 2021
עמודים9301-9312
מספר עמודים12
מסת"ב (אלקטרוני)9781713845065
סטטוס פרסוםפורסם - 2021
אירוע38th International Conference on Machine Learning, ICML 2021 - Virtual, Online
משך הזמן: 18 יולי 202124 יולי 2021

סדרות פרסומים

שםProceedings of Machine Learning Research
כרך139

כנס

כנס38th International Conference on Machine Learning, ICML 2021
עירVirtual, Online
תקופה18/07/2124/07/21

ASJC Scopus subject areas

  • ???subjectarea.asjc.1700.1702???
  • ???subjectarea.asjc.1700.1712???
  • ???subjectarea.asjc.2200.2207???
  • ???subjectarea.asjc.2600.2613???

טביעת אצבע

להלן מוצגים תחומי המחקר של הפרסום 'Recomposing the Reinforcement Learning Building Blocks with Hypernetworks'. יחד הם יוצרים טביעת אצבע ייחודית.

פורמט ציטוט ביבליוגרפי