Depth Separations in Neural Networks: What is Actually Being Separated?

Itay Safran, Ronen Eldan, Ohad Shamir

Research output: Contribution to journalArticlepeer-review

Abstract

Existing depth separation results for constant-depth networks essentially show that certain radial functions in Rd, which can be easily approximated with depth 3 networks, cannot be approximated by depth 2 networks, even up to constant accuracy, unless their size is exponential in d. However, the functions used to demonstrate this are rapidly oscillating, with a Lipschitz parameter scaling polynomially with the dimension d (or equivalently, by scaling the function, the hardness result applies to O(1)-Lipschitz functions only when the target accuracy ϵ is at most poly(1/d)). In this paper, we study whether such depth separations might still hold in the natural setting of O(1)-Lipschitz radial functions, when ϵ does not scale with d. Perhaps surprisingly, we show that the answer is negative: In contrast with the intuition suggested by previous work, it is possible to approximate O(1)-Lipschitz radial functions with depth 2, size poly(d) networks, for every constant ϵ. We complement it by showing that approximating such functions is also possible with depth 2, size poly(1/ϵ) networks, for every constant d. Finally, we show that it is not possible to have polynomial dependence in both d,1/ϵ simultaneously. Overall, our results indicate that in order to show depth separations for expressing O(1)-Lipschitz functions with constant accuracy—if at all possible—one would need fundamentally different techniques than existing ones in the literature.
Original languageEnglish
Pages (from-to)225-257
Number of pages33
JournalConstructive Approximation
Volume55
Issue number1
Early online date2 Jun 2021
DOIs
StatePublished - Feb 2022

Fingerprint

Dive into the research topics of 'Depth Separations in Neural Networks: What is Actually Being Separated?'. Together they form a unique fingerprint.

Cite this