Abstract
We propose a novel class of kernels to alleviate the high computational cost of large-scale nonparametric learning with kernel methods. The proposed kernel is defined based on a hierarchical partitioning of the underlying data domain, where the Nyström method (a globally low-rank approximation) is married with a locally lossless approximation in a hierarchical fashion. The kernel maintains (strict) positive-definiteness. The corresponding kernel matrix admits a recursively off-diagonal low-rank structure, which allows for fast linear algebra computations. Suppressing the factor of data dimension, the memory and arithmetic complexities for training a regression or a classifier are reduced from O(n2) and O(n3) to O(nr) and O(nr2), respectively, where n is the number of training examples and r is the rank on each level of the hierarchy. Although other randomized approximate kernels entail a similar complexity, empirical results show that the proposed kernel achieves a matching performance with a smaller r. We demonstrate comprehensive experiments to show the effective use of the proposed kernel on data sizes up to the order of millions.
| Original language | English |
|---|---|
| Pages (from-to) | 1-42 |
| Number of pages | 42 |
| Journal | Journal of Machine Learning Research |
| Volume | 18 |
| State | Published - 1 Aug 2017 |
Keywords
- Hierarchical kernels
- Nonparametric learning
All Science Journal Classification (ASJC) codes
- Control and Systems Engineering
- Software
- Statistics and Probability
- Artificial Intelligence
Fingerprint
Dive into the research topics of 'Hierarchically compositional kernels for scalable nonparametric learning'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver