Second-Order Information in Non-Convex Stochastic Optimization: Power and Limitations

Yossi Arjevani, Yair Carmon, John C. Duchi, Dylan J. Foster, Ayush Sekhari, Karthik Sridharan

Research output: Contribution to journalConference articlepeer-review

Abstract

We design an algorithm which finds an ε-approximate stationary point (with k∇F(x)k ≤ ε) using O(ε3) stochastic gradient and Hessian-vector products, matching guarantees that were previously available only under a stronger assumption of access to multiple queries with the same random seed. We prove a lower bound which establishes that this rate is optimal and—surprisingly—that it cannot be improved using stochastic pth order methods for any p ≥ 2, even when the first p derivatives of the objective are Lipschitz. Together, these results characterize the complexity of non-convex stochastic optimization with second-order methods and beyond. Expanding our scope to the oracle complexity of finding (ε, γ)-approximate second-order stationary points, we establish nearly matching upper and lower bounds for stochastic second-order methods. Our lower bounds here are novel even in the noiseless case.

Original languageEnglish
Pages (from-to)242-299
Number of pages58
JournalProceedings of Machine Learning Research
Volume125
StatePublished - 2020
Externally publishedYes
Event33rd Conference on Learning Theory, COLT 2020 - Virtual, Online, Austria
Duration: 9 Jul 202012 Jul 2020

Keywords

  • Hessian-vector products
  • Stochastic optimization
  • non-convex optimization
  • second-order methods
  • variance reduction

All Science Journal Classification (ASJC) codes

  • Artificial Intelligence
  • Software
  • Control and Systems Engineering
  • Statistics and Probability

Fingerprint

Dive into the research topics of 'Second-Order Information in Non-Convex Stochastic Optimization: Power and Limitations'. Together they form a unique fingerprint.

Cite this