Abstract
In this paper, we present results of experimental studies related to the existence of totally optimal decision trees (which are optimal relative to two or more cost functions simultaneously) for nine decision tables from the UCI Machine Learning Repository. Such trees can be useful when we consider decision trees as algorithms for problem solving or as a way for knowledge representation. For cost functions, we use depth, average depth, and number of nodes. We study not only exact but also approximate decision trees based on five uncertainty measures: entropy, Gini index, misclassification error, relative misclassification error, and number of unordered pairs of rows with different decisions. To investigate the existence of totally optimal trees, we use an extension of dynamic programming that allows us to make multi-stage optimization of decision trees relative to a sequence of cost functions. Experimental results show that totally optimal decision trees exist in many cases. The behavior of graphs that describe how the number of decision tables with totally optimal decision trees depends on their accuracy is mainly irregular. However, one can observe some trends, in particular, an upward trend when accuracy is decreasing.
Original language | English (US) |
---|---|
Pages (from-to) | 245-261 |
Number of pages | 17 |
Journal | Fundamenta Informaticae |
Volume | 165 |
Issue number | 3-4 |
DOIs | |
State | Published - Mar 22 2019 |
Bibliographical note
KAUST Repository Item: Exported on 2020-10-01Acknowledgements: Research reported in this publication was supported by King Abdullah University of Science and Technology (KAUST). We are greatly indebted to the anonymous reviewers for useful comments and suggestions.