Abstract
The construction of efficient decision and classification trees is a fundamental task in Big Data analytics which is known to be NP-hard. Accordingly, many greedy heuristics were suggested for the construction of decision-trees, but were found to result in local-optimum solutions. In this work we present the dual information distance (DID) method for efficient construction of decision trees that is computationally attractive, yet relatively robust to noise. The DID heuristic selects features by considering both their immediate contribution to the classification, as well as their future potential effects. It represents the construction of classification Uees by finding the shortest paths over a graph of partitions that are denned by the selected features. The DID method takes into account both the orthogonality between the selected partitions, as well as the reduction of uncertainty on the class partition given the selected attributes. We show that the DID method often outperforms popular classifiers, in terms of average depth and classification accuracy.
Original language | English |
---|---|
Pages (from-to) | 133-147 |
Number of pages | 15 |
Journal | Quality Technology and Quantitative Management |
Volume | 11 |
Issue number | 1 |
DOIs | |
State | Published - Mar 2014 |
Keywords
- Average path length
- Big-data
- C4.5
- Decision trees
- Online classifiers
All Science Journal Classification (ASJC) codes
- Business and International Management
- Industrial relations
- Management Science and Operations Research
- Information Systems and Management
- Management of Technology and Innovation