TAKING COST INTO ACCOUNTS
 

 

 

 

 

 

 

 

 

 

 

 

 

 

Taking Cost into Account In the discussion so far, the error rate has been the sole measure for evaluating the fitness of rules and subtrees. In many applications, however, the costs of misclassification vary from class to class. Certainly, in a medical diagnosis, a false negative can be more harmful than a false positive; a scary Pap smear result that, on further investigation, proves to have been a false positive, is much preferable to an undetected cancer. A cost function multiplies the probability of misclassification by a weight indicating the cost of that misclassification. Several tools allow the use of such a cost function instead of an error function for building decision trees. Further Refinements to the Decision Tree Method Although they are not found in most commercial data mining software pack­ ages, there are some interesting refinements to the basic decision tree method that are worth discussing. Using More Than One Field at a Time Most decision tree algorithms test a single variable to perform each split. This approach can be problematic for several reasons, not least of which is that it can lead to trees with more nodes than necessary.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Extra nodes are cause for concern because only the training records that arrive at a given node are avail­ able for inducing the subtree below it. The fewer training examples per node, the less stable the resulting model. Suppose that we are interested in a condition for which both age and gender are important indicators. If the root node split is on age, then each child node contains only about half the women. If the initial split is on gender, then each child node contains only about half the old folks. Several algorithms have been developed to allow multiple attributes to be used in combination to form the splitter. One technique forms Boolean conjunctions of features in order to reduce the complexity of the tree. After find­ ing the feature that forms the best split, the algorithm looks for the feature which, when combined with the feature chosen first, does the best job of improving the split. Features continue to be added as long as there continues to be a statistically significant improvement in the resulting split.