Abstract
We consider the fundamental problem of learning a single neuron x → σ(w⊤x) in a realizable setting, using standard gradient methods with random initialization, and under general families of input distributions and activations. On the one hand, we show that some assumptions on both the distribution and the activation function are necessary. On the other hand, we prove positive guarantees under mild assumptions, which go significantly beyond those studied in the literature so far. We also point out and study the challenges in further strengthening and generalizing our results.
| Original language | English |
|---|---|
| Pages (from-to) | 3756-3786 |
| Number of pages | 31 |
| Journal | Proceedings of Machine Learning Research |
| Volume | 125 |
| State | Published - 2020 |
| Event | 33rd Conference on Learning Theory, COLT 2020 - Virtual, Online, Austria Duration: 9 Jul 2020 → 12 Jul 2020 |
All Science Journal Classification (ASJC) codes
- Artificial Intelligence
- Software
- Control and Systems Engineering
- Statistics and Probability
Fingerprint
Dive into the research topics of 'Learning a Single Neuron with Gradient Methods'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver