THE CART ALGORITHM
 

 

 

 

 

 

 

 

 

 

 

 

 

 

The CART algorithm identifies candidate subtrees through a process of repeated pruning. The goal is to prune first those branches providing the least additional predictive power per leaf. In order to identify these least useful branches, CART relies on a concept called the adjusted error rate. This is a mea­ sure that increases each node’s misclassification rate on the training set by imposing a complexity penalty based on the number of leaves in the tree. The adjusted error rate is used to identify weak branches (those whose misclassifi­ cation rate is not low enough to overcome the penalty) and mark them for pruning. 186 The error rate on the validation set should be larger than the error rate on the training set, because the training set was used to build the rules in the model.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

A large difference in the misclassification error rate, however, is a symptom of an unstable model. This difference can show up in several ways as shown by the following three graphs generated by SAS Enterprise Miner. The graphs represent the percent of records correctly classified by the candidate models in a decision tree. Candidate subtrees with fewer nodes are on the left; with more nodes are on the right. These figures show the percent correctly classified instead of the error rate, so they are upside down from the way similar charts are shown elsewhere in this book. As expected, the first chart shows the candidate trees performing better and better on the training set as the trees have more and more nodes—the training process stops when the performance no longer improves. On the validation set, however, the candidate trees reach a peak and then the performance starts to decline as the trees get larger. The optimal tree is the one that works on the validation set, and the choice is easy because the peak is well-defined. This chart shows a clear inflection point in the graph of the percent correctly classified in the validation set. Decision Trees Sometimes, though, there is not clear demarcation point. That is, the performance of the candidate models on the validation set never quite reaches a maximum as the trees get larger. In this case, the pruning algorithm chooses the entire tree (the largest possible subtree), as shown in the following illustration: