CAPTURING IMPORTANT ATTRIBUTES
 

 

 

 

 

 

 

 

 

 

 

 

 

Finally, to capture important attributes of a town that were not readily dis­ cernable from variables already in the signature, additional variables were derived from those already present. For example, both distance and direction from Boston seemed likely to be important in forming town clusters. These are calculated from the latitude and longitude of the gold-domed State House that Oliver Wendell Holmes once called “the hub of the solar system.” (Today’s Bostonians are not as modest as Justice Holmes; they now refer to the entire city as “the hub of the universe” or simply “the Hub.” Headline writers commonly save three letters by using “hub” in place of “Boston” as in the apocryphal “Hub man killed in NYC terror attack.”) The online postal service database provides a convenient source for the latitude and longitude for each town. Most towns have a single zip code; for those with more, the coordinates of the lowest numbered zip code were arbitrarily chosen. The distance from the town to Boston was easily calculated from the latitude and longitude using standard Euclidean distance. Despite rumors that have reached us that the Earth is round, we used simple plane geometry for these calculations:

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

These formulas are imprecise, since they assume that the earth is flat and that one degree of latitude has the same length as one degree of longitude. The area in question is not large enough for these flat Earth assumptions to make much difference. Also note that since these values will only be compared to one another there is no need to convert them into familiar units such as miles, kilometers, or degrees. Automatic Cluster Detection Creating Clusters The first attempt to build clusters used signatures that describe the towns in terms of both demographics and geography. Clusters built this way could not be used directly to create editorial zones because of the geographic constraint that editorial zones must comprise contiguous towns. Since towns with similar demographics are not necessarily close to one another, clusters based on our signatures include towns all over the map, Weighting could be used to increase the importance of the geographic vari­ ables in cluster formation, but the result would be to cause the nongeographic variables to be ignored completely. Since the goal was to find similarities based at least partially on demographic data, the idea of geographic clusters was abandoned in favor of demographic ones. The demographic clusters could then be used as one factor in designing editorial zones, along with the geographic constraints.