 
Cluster Analysis 





Amalgamation or Linkage Rules
At the first step, when each object represents its own cluster, the distances
between those objects are defined by the chosen distance measure. However, once
several objects have been linked together, we need a linkage or amalgamation
rule to determine when two clusters are sufficiently similar to be linked
together. The major linkage rules include:
Single linkage (nearest neighbor). In this method the distance between
two clusters is determined by the distance of the two closest objects (nearest
neighbors) in the different clusters. This rule will, in a sense, string objects
together to form clusters, and the resulting clusters tend to represent long
"chains."
Complete linkage (furthest neighbor). In this method, the distances
between clusters are determined by the greatest distance between any two objects
in the different clusters (i.e., by the "furthest neighbors"). This method
usually performs quite well in cases when the objects actually form naturally
distinct "clumps." If the clusters tend to be somehow elongated or of a "chain"
type nature, then this method is inappropriate.
Unweighted pairgroup average. In this method, the distance between two
clusters is calculated as the average distance between all pairs of objects in
the two different clusters. This method is also very efficient when the objects
form natural distinct "clumps," however, it performs equally well with
elongated, "chain" type clusters.
Weighted pairgroup average. This method is identical to the unweighted
pairgroup average method, except that in the computations, the size of the
respective clusters (i.e., the number of objects contained in them) is used as a
weight. Thus, this method (rather than the previous method) should be used when
the cluster sizes are suspected to be greatly uneven.
Unweighted pairgroup centroid. The centroid of a cluster is the average
point in the multidimensional space defined by the dimensions. In a sense, it is
the center of gravity for the respective cluster. In this method, the distance
between two clusters is determined as the difference between centroids.
Weighted pairgroup centroid (median). This method is identical to the
previous one, except that weighting is introduced into the computations to take
into consideration differences in cluster sizes (i.e., the number of objects
contained in them). Thus, when there are (or one suspects there to be)
considerable differences in cluster sizes, this method is preferable to the
previous one.
Ward's method. This method is distinct from all other methods because it
uses an analysis of variance approach to evaluate the distances between
clusters. In short, this method attempts to minimize the Sum of Squares (SS) of
any two (hypothetical) clusters that can be formed at each step. In general,
this method is regarded as very efficient, however, it tends to create clusters
of small size.
