TY - GEN
T1 - Coresets for Decision Trees of Signals
AU - Jubran, Ibrahim
AU - Sanches Shayda, Ernesto Evgeniy
AU - Newman, Ilan
AU - Feldman, Dan
N1 - Publisher Copyright: © 2021 Neural information processing systems foundation. All rights reserved.
PY - 2021
Y1 - 2021
N2 - A k-decision tree t (or k-tree) is a recursive partition of a matrix (2D-signal) into k ≥ 1 block matrices (axis-parallel rectangles, leaves) where each rectangle is assigned a real label. Its regression or classification loss to a given matrix D of N entries (labels) is the sum of squared differences over every label in D and its assigned label by t. Given an error parameter ε ∈ (0, 1), a (k, ε)-coreset C of D is a small summarization that provably approximates this loss to every such tree, up to a multiplicative factor of 1 ± ε. In particular, the optimal k-tree of C is a (1 + ε)-approximation to the optimal k-tree of D. We provide the first algorithm that outputs such a (k, ε)-coreset for every such matrix D. The size |C| of the coreset is polynomial in k log(N)/ε, and its construction takes O(Nk) time. This is by forging a link between decision trees from machine learning – to partition trees in computational geometry. Experimental results on sklearn and lightGBM show that applying our coresets on real-world data-sets boosts the computation time of random forests and their parameter tuning by up to x10, while keeping similar accuracy. Full open source code is provided.
AB - A k-decision tree t (or k-tree) is a recursive partition of a matrix (2D-signal) into k ≥ 1 block matrices (axis-parallel rectangles, leaves) where each rectangle is assigned a real label. Its regression or classification loss to a given matrix D of N entries (labels) is the sum of squared differences over every label in D and its assigned label by t. Given an error parameter ε ∈ (0, 1), a (k, ε)-coreset C of D is a small summarization that provably approximates this loss to every such tree, up to a multiplicative factor of 1 ± ε. In particular, the optimal k-tree of C is a (1 + ε)-approximation to the optimal k-tree of D. We provide the first algorithm that outputs such a (k, ε)-coreset for every such matrix D. The size |C| of the coreset is polynomial in k log(N)/ε, and its construction takes O(Nk) time. This is by forging a link between decision trees from machine learning – to partition trees in computational geometry. Experimental results on sklearn and lightGBM show that applying our coresets on real-world data-sets boosts the computation time of random forests and their parameter tuning by up to x10, while keeping similar accuracy. Full open source code is provided.
UR - http://www.scopus.com/inward/record.url?scp=85131951647&partnerID=8YFLogxK
M3 - منشور من مؤتمر
T3 - Advances in Neural Information Processing Systems
SP - 30352
EP - 30364
BT - Advances in Neural Information Processing Systems 34 - 35th Conference on Neural Information Processing Systems, NeurIPS 2021
A2 - Ranzato, Marc'Aurelio
A2 - Beygelzimer, Alina
A2 - Dauphin, Yann
A2 - Liang, Percy S.
A2 - Wortman Vaughan, Jenn
T2 - 35th Conference on Neural Information Processing Systems, NeurIPS 2021
Y2 - 6 December 2021 through 14 December 2021
ER -