CLUSTERING PEOPLE BY AGE
 

 

 

 

 

 

 

 

 

 

 

 

 

 

This illustration of agglomerative clustering uses an example in one dimen­ sion with the single linkage measure for distance between clusters. These choices make it possible to follow the algorithm through all its iterations with­ out having to worry about calculating distances using squares and square roots. The data consists of the ages of people at a family gathering. The goal is to cluster the participants using their age, and the metric for the distance between two people is simply the difference in their ages. The metric for the distance between two clusters of people is the difference in age between the oldest member of the younger cluster and the youngest member of the older cluster. (The one dimensional version of the single linkage measure.) Because the distances are so easy to calculate, the example dispenses with the similarity matrix. The procedure is to sort the participants by age, then begin clustering by first merging clusters that are 1 year apart, then 2 years, and so on until there is only one big cluster. This is the level of clustering that seems the most useful. The algorithm appears to have clustered the population into three generations: children, parents, and grandparents.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Divisive Clustering We have already noted some similarities between trees formed by the agglom­ erative clustering techniques and ones formed by decision tree algorithms. Although the agglomerative methods work from the leaves to the root, while the decision tree algorithms work from the root to the leaves, they both create a similar hierarchical structure. The hierarchical structure reflects another similarity between the methods. Decisions made early on in the process are never revisited, which means that some fairly simple clusters may not be detected if an early split or agglomeration destroys the structure. Seeing the similarity between the trees produced by the two methods, it is natural to ask whether the algorithms used for decision trees may also be used for clustering. The answer is yes. A decision tree algorithm starts with the entire collection of records and looks for a way to split it into partitions that are purer, in some sense defined by a purity function. In the standard decision tree algo­ rithms, the purity function uses a separate variable—the target variable—to make this decision. All that is required to turn decision trees into a clustering algorithm is to supply a purity function chosen to either minimize the average intracluster distance or maximize the intercluster distances. An example of such a purity function is the average distance from the centroid of the parent.