Abstract
Decision forests, such as random forest (RF) are widely used for tabular data, mainly due to their predictive performance and ease of usage. However, given that the forest's trees may produce contradictory predictions for a certain sample, the usage of decision forests in applications that involve decision-making necessitates further reliability assessment of the predictions for generating a trustworthy combined prediction. Model distillation using a born-again tree is a common approach for converting a decision forest into a single decision tree (DT) while preserving the forest's predictive performance and supporting decision-makers. In this paper, we introduce PnT (Path in Tree), a novel approach that learns a path-based encoding from a decision forest. PnT applies an iterative algorithm, where in each iteration, a batch of trees is trained and then used to identify informative paths. These paths are then encoded and utilized for PnT-DT, an approach for producing a contradiction-free born-again DT. We also show that PnT can be leveraged for PnT-RF, a born-again forest approach, capable of improving the predictive performance of a plain decision forest. We evaluate PnT-DT and PnT-RF on 40 classification datasets and demonstrate that both PnT-DT and PnT-RF significantly outperform existing state-of-the-art (SOTA) born-again DT and decision forest methods in terms of predictive performance.
Original language | American English |
---|---|
Article number | 102545 |
Journal | Information Fusion |
Volume | 112 |
DOIs | |
State | Published - 1 Dec 2024 |
Keywords
- Born-again
- Decision forest
- Decision tree
- Random forest
All Science Journal Classification (ASJC) codes
- Software
- Signal Processing
- Information Systems
- Hardware and Architecture