Abstract
We analyze classification problems in which data is generated by a two-tiered random process. The class is generated first, then a layer of conditionally independent hidden variables, and finally the observed variables. For sources like this, the Bayes-optimal rule for predicting the class given the values of the observed variables is a two-layer neural network.We show that, if the hidden variables have non-negligible effects on many observed variables, a linear classifier approximates the error rate of the Bayes optimal classifier up to lower order terms. We also show that the hinge loss of a linear classifier is not much more than the Bayes error rate, which implies that an accurate linear classifier can be found efficiently.
| Original language | English |
|---|---|
| Pages (from-to) | 209-231 |
| Number of pages | 23 |
| Journal | Machine Learning |
| Volume | 86 |
| Issue number | 2 |
| DOIs | |
| State | Published - Feb 2012 |
Keywords
- Bayes-optimal
- Hidden variables
- Learning theory
- Linear classification
All Science Journal Classification (ASJC) codes
- Software
- Artificial Intelligence