Confdtree: A statistical method for improving decision trees

Research output: Contribution to journalArticlepeer-review

Abstract

Decision trees have three main disadvantages: reduced performance when the training set is small; rigid decision criteria; and the fact that a single "uncharacteristic" attribute might "derail" the classification process. In this paper we present ConfDTree (Confidence-Based Decision Tree) - a post-processing method that enables decision trees to better classify outlier instances. This method, which can be applied to any decision tree algorithm, uses easy-to-implement statistical methods (confidence intervals and two-proportion tests) in order to identify hard-to-classify instances and to propose alternative routes. The experimental study indicates that the proposed post-processing method consistently and significantly improves the predictive performance of decision trees, particularly for small, imbalanced or multi-class datasets in which an average improvement of 5%~9% in the AUC performance is reported.

Original languageAmerican English
Pages (from-to)392-407
Number of pages16
JournalJournal of Computer Science and Technology
Volume29
Issue number3
DOIs
StatePublished - 1 Jan 2014

Keywords

  • confidence interval
  • decision tree
  • imbalanced dataset

All Science Journal Classification (ASJC) codes

  • Software
  • Theoretical Computer Science
  • Hardware and Architecture
  • Computer Science Applications
  • Computational Theory and Mathematics

Fingerprint

Dive into the research topics of 'Confdtree: A statistical method for improving decision trees'. Together they form a unique fingerprint.

Cite this