The issues raised by the choice of distance measures are exactly the same as those previously discussed in relation to the K-means approach. It might seem that with N initial clusters for N data points, N2 measurement calculations are required to create the distance table. If the similarity measure is a true distance metric, only half that is needed because all true distance met rics follow the rule that Distance(X,Y) = Distance(Y,X). In the vocabulary of mathematics, the similarity matrix is lower triangular. The next step is to find the smallest value in the similarity matrix. This identifies the two clusters that are most similar to one another. Merge these two clusters into a new one and update the similarity matrix by replacing the two rows that described the par ent cluster with a new row that describes the distance between the merged cluster and the remaining clusters. There are now N - 1 clusters and N - 1 rows in the similarity matrix. Repeat the merge step N - 1 times, so all records belong to the same large cluster. Each iteration remembers which clusters were merged and the dis tance between them. This information is used to decide which level of cluster ing to make use of.
In the single linkage method, the distance between two clusters is given by the distance between the closest members. This method produces clusters with the property that every member of a cluster is more closely related to at least one member of its cluster than to any point outside it. Another approach is the complete linkage method, where the distance between two clusters is given by the distance between their most distant members. This method produces clusters with the property that all members lie within some known maximum distance of one another. Third method is the centroid distance, where the distance between two clusters is measured between the centroids of each. Clusters and Trees The agglomeration algorithm creates hierarchical clusters. At each level in the hierarchy, clusters are formed from the union of two clusters at the next level down. A good way of visualizing these clusters is as a tree. Of course, such a tree may look like the decision trees discussed in, but there are some important differences. The most important is that the nodes of the cluster tree do not embed rules describing why the clustering takes place; the nodes sim ply state the fact that the two children have the minimum distance of all pos sible clusters pairs. Another difference is that a decision tree is created to maximize the leaf purity of a given target variable. There is no target for the cluster tree, other than self-similarity within each cluster. Later in this chapter we’ll discuss divisive clustering methods. These are similar to the agglomera tive methods, except that agglomerative methods are build by starting from the leaving and working towards the root whereas divisive methods start at the root and work down to the leaves.