Abstract
Recent results show that vanilla gradient descent can be accelerated for smooth convex objectives, merely by changing the stepsize sequence. We show that this can lead to surprisingly large errors indefinitely, and therefore ask: Is there any stepsize schedule for gradient descent that accelerates the classic O(1/T) convergence rate, at any stopping time T?.
Original language | English |
---|---|
Pages (from-to) | 5335-5339 |
Number of pages | 5 |
Journal | Proceedings of Machine Learning Research |
Volume | 247 |
DOIs | |
State | Published - 2024 |
Event | 37th Annual Conference on Learning Theory, COLT 2024 - Edmonton, Canada Duration: 30 Jun 2024 → 3 Jul 2024 |
All Science Journal Classification (ASJC) codes
- Artificial Intelligence
- Software
- Control and Systems Engineering
- Statistics and Probability