TY - GEN
T1 - STORM+
T2 - 35th Conference on Neural Information Processing Systems, NeurIPS 2021
AU - Levy, Kfir Y.
AU - Kavis, Ali
AU - Cevher, Volkan
N1 - Publisher Copyright: © 2021 Neural information processing systems foundation. All rights reserved.
PY - 2021
Y1 - 2021
N2 - In this work we investigate stochastic non-convex optimization problems where the objective is an expectation over smooth loss functions, and the goal is to find an approximate stationary point. The most popular approach to handling such problems is variance reduction techniques, which are also known to obtain tight convergence rates, matching the lower bounds in this case. Nevertheless, these techniques require a careful maintenance of anchor points in conjunction with appropriately selected "mega-batchsizes". This leads to a challenging hyperparameter tuning problem, that weakens their practicality. Recently, [Cutkosky and Orabona, 2019] have shown that one can employ recursive momentum in order to avoid the use of anchor points and large batchsizes, and still obtain the optimal rate for this setting. Yet, their method called STORM crucially relies on the knowledge of the smoothness, as well a bound on the gradient norms. In this work we propose STORM+, a new method that is completely parameter-free, does not require large batch-sizes, and obtains the optimal O(1/T 1/3) rate for finding an approximate stationary point. Our work builds on the STORM algorithm, in conjunction with a novel approach to adaptively set the learning rate and momentum parameters.
AB - In this work we investigate stochastic non-convex optimization problems where the objective is an expectation over smooth loss functions, and the goal is to find an approximate stationary point. The most popular approach to handling such problems is variance reduction techniques, which are also known to obtain tight convergence rates, matching the lower bounds in this case. Nevertheless, these techniques require a careful maintenance of anchor points in conjunction with appropriately selected "mega-batchsizes". This leads to a challenging hyperparameter tuning problem, that weakens their practicality. Recently, [Cutkosky and Orabona, 2019] have shown that one can employ recursive momentum in order to avoid the use of anchor points and large batchsizes, and still obtain the optimal rate for this setting. Yet, their method called STORM crucially relies on the knowledge of the smoothness, as well a bound on the gradient norms. In this work we propose STORM+, a new method that is completely parameter-free, does not require large batch-sizes, and obtains the optimal O(1/T 1/3) rate for finding an approximate stationary point. Our work builds on the STORM algorithm, in conjunction with a novel approach to adaptively set the learning rate and momentum parameters.
UR - http://www.scopus.com/inward/record.url?scp=85132554824&partnerID=8YFLogxK
M3 - منشور من مؤتمر
T3 - Advances in Neural Information Processing Systems
SP - 20571
EP - 20582
BT - Advances in Neural Information Processing Systems 34 - 35th Conference on Neural Information Processing Systems, NeurIPS 2021
A2 - Ranzato, Marc'Aurelio
A2 - Beygelzimer, Alina
A2 - Dauphin, Yann
A2 - Liang, Percy S.
A2 - Wortman Vaughan, Jenn
Y2 - 6 December 2021 through 14 December 2021
ER -