Abstract
We design an algorithm which finds an ϵ-approximate stationary point (with ∥∇F(x)∥≤ϵ) using O(ϵ−3) stochastic gradient and Hessian-vector products, matching guarantees that were previously available only under a stronger assumption of access to multiple queries with the same random seed. We prove a lower bound which establishes that this rate is optimal and—surprisingly—that it cannot be improved using stochastic pth order methods for any p≥2, even when the first p derivatives of the objective are Lipschitz. Together, these results characterize the complexity of non-convex stochastic optimization with second-order methods and beyond. Expanding our scope to the oracle complexity of finding (ϵ,γ)-approximate second-order stationary points, we establish nearly matching upper and lower bounds for stochastic second-order methods. Our lower bounds here are novel even in the noiseless case.
| Original language | English |
|---|---|
| Title of host publication | Proceedings of Thirty Third Conference on Learning Theory |
| Editors | Jacob Abernethy, Shivani Agarwal |
| Pages | 242-299 |
| Number of pages | 58 |
| Volume | 125 |
| State | Published - 1 Sep 2020 |
| Event | 33rd Annual Conference on Learning Theory, COLT 2020 - virtual Duration: 9 Jul 2020 → 12 Jul 2020 Conference number: 33 |
Publication series
| Name | Proceedings of Machine Learning Research |
|---|---|
| Publisher | PMLR |
Conference
| Conference | 33rd Annual Conference on Learning Theory, COLT 2020 |
|---|---|
| Abbreviated title | COLT 2020 |
| Period | 9/07/20 → 12/07/20 |