CATHEGORICAL VARIABLES
 

 

 

 

 

 

 

 

 

 

 

 

 

 

Suppose the important business question is not who will respond but what will be the size of the customer’s next order? The decision tree can be used to answer that question too. Assuming that order amount is one of the variables available in the preclassified model set, the average order size in each leaf can be used as the estimated order size for any unclassified record that meets the criteria for that leaf. It is even possible to use a numeric target variable to build the tree; such a tree is called a regression tree. Instead of increasing the purity of a cate­ gorical variable, each split in the tree is chosen to decrease the variance in the values of the target variable within each child node. The fact that trees can be (and sometimes are) used to estimate continuous values does not make it a good idea. A decision tree estimator can only gener­ ate as many discrete values as there are leaves in the tree. To estimate a contin­ uous variable, it is preferable to use a continuous function. Regression models and neural network models are generally more appropriate for estimation.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

The tree is a binary tree of nonuniform depth; that is, each nonleaf node has two children and leaves are not all at the same distance from the root. In this case, each node represents a yes-or-no question, whose answer determines by which of two paths a record proceeds to the next level of the tree. Since any multiway split can be expressed as a series of binary splits, there is no real need for trees with higher branching factors. Nevertheless, many data mining tools are capable of producing trees with more than two branches. For example, some decision tree algorithms split on categorical variables by creating a branch for each class, leading to trees with differing numbers of branches at different nodes. Although there are many variations on the core decision tree algorithm, all of them share the same basic procedure: Repeatedly split the data into smaller and smaller groups in such a way that each new generation of nodes has greater purity than its ancestors with respect to the target variable. For most of this discussion, we assume a binary, categorical target variable, such as responder/nonresponder. This simplifies the explanations without much loss of generality.