EVALUATING THE FINAL TREE
 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Because this pruning algorithm is based solely on misclassification rate, without taking the probability of each classification into account, it replaces any subtree whose leaves all make the same classification with a common par­ ent that also makes that classification. In applications where the goal is to select a small proportion of the records (the top 1 percent or 10 percent, for example), this pruning algorithm may hurt the performance of the tree, since some of the removed leaves contain a very high proportion of the target class. Some tools, such as SAS Enterprise Miner, allow the user to prune trees optimally for such situations. Using the Test Set to Evaluate the Final Tree The winning subtree was selected on the basis of its overall error rate when applied to the task of classifying the records in the validation set.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

But, while we expect that the selected subtree will continue to be the best performing subtree when applied to other datasets, the error rate that caused it to be selected may slightly overstate its effectiveness. There are likely to be a large number of subtrees that all perform about as well as the one selected. To a certain extent, the one of these that delivered the lowest error rate on the validation set may simply have “gotten lucky” with that particular collection of records. For that reason,the selected subtree is applied to a third preclassified dataset that is disjoint with both the validation set and the train­ ing set. This third dataset is called the test set. The error rate obtained on the test set is used to predict expected performance of the classification rules rep­ resented by the selected tree when applied to unclassified data. Do not evaluate the performance of a model by its lift or error rate on the validation set. Like the training set, it has had a hand in creating the model and so will overstate the model’s accuracy. Always measure the model’s accuracy on a test set that is drawn from the same population as the training and validation sets, but has not been used in any way to create the model.