Abstract
Existing depth separation results for constant-depth networks essentially show that certain radial functions in Rd, which can be easily approximated with depth 3 networks, cannot be approximated by depth 2 networks, even up to constant accuracy, unless their size is exponential in d. However, the functions used to demonstrate this are rapidly oscillating, with a Lipschitz parameter scaling polynomially with the dimension d (or equivalently, by scaling the function, the hardness result applies to O(1)-Lipschitz functions only when the target accuracy ε is at most poly(1/d)). In this paper, we study whether such depth separations might still hold in the natural setting of O(1)-Lipschitz radial functions, when ε does not scale with d. Perhaps surprisingly, we show that the answer is negative: In contrast to the intuition suggested by previous work, it is possible to approximate O(1)-Lipschitz radial functions with depth 2, size poly(d) networks, for every constant ε. We complement it by showing that approximating such functions is also possible with depth 2, size poly(1/ε) networks, for every constant d. Finally, we show that it is not possible to have polynomial dependence in both d, 1/ε simultaneously. Overall, our results indicate that in order to show depth separations for expressing O(1)-Lipschitz functions with constant accuracy – if at all possible – one would need fundamentally different techniques than existing ones in the literature.
Original language | American English |
---|---|
Pages (from-to) | 2664-2666 |
Number of pages | 3 |
Journal | Proceedings of Machine Learning Research |
Volume | 99 |
State | Published - 1 Jan 2019 |
Event | 32nd Conference on Learning Theory, COLT 2019 - Phoenix, United States Duration: 25 Jun 2019 → 28 Jun 2019 https://proceedings.mlr.press/v99 |
All Science Journal Classification (ASJC) codes
- Artificial Intelligence
- Software
- Control and Systems Engineering
- Statistics and Probability