One popular splitting criterion is named Gini, after Italian statistician and economist, Corrado Gini. This measure, which is also used by biologists and ecologists studying population diversity, gives the probability that two items chosen at random from the same population are in the same class. For a pure population, this probability is 1. The Gini measure of a node is simply the sum of the squares of the propor tions of the classes. the parent population has an equal number of light and dark dots. A node with equal numbers of each of 2 classes has a score of 0.52 + 0.52 = 0.5, which is expected because the chance of picking the same class twice by random selection with replacement is one out of two. The Gini score for either of the resulting nodes is 0.12 + 0.92 = 0.82. A perfectly pure node would have a Gini score of 1. A node that is evenly bal anced would have a Gini score of 0.5. Sometimes the scores is doubled and then 1 subtracted, so it is between 0 and 1. However, such a manipulation makes no difference when comparing different scores to optimize purity.
To calculate the impact of a split, take the Gini score of each child node and multiply it by the proportion of records that reach that node and then sum the resulting numbers. In this case, since the records are split evenly between the two nodes resulting from the split and each node has the same Gini score, the score for the split is the same as for either of the two nodes. Decision Trees 179 Entropy Reduction or Information Gain Information gain uses a clever idea for defining purity. If a leaf is entirely pure, then the classes in the leaf can be easily described—they all fall in the same class. On the other hand, if a leaf is highly impure, then describing it is much more complicated. Information theory, a part of computer science, has devised a measure for this situation called entropy. In information theory, entropy is a measure of how disorganized a system is. A comprehensive introduction to information theory is far beyond the scope of this book. For our purposes, the intuitive notion is that the number of bits required to describe a particular sit uation or outcome depends on the size of the set of possible outcomes. Entropy can be thought of as a measure of the number of yes/no questions it would take to determine the state of the system. If there are 16 possible states, it takes log2(16), or four bits, to enumerate them or identify a particular one. Addi tional information reduces the number of questions needed to determine the state of the system, so information gain means the same thing as entropy reduction. Both terms are used to describe decision tree algorithms.