Distance Measures
The joining or tree clustering method uses the dissimilarities or distances
between objects when forming the clusters. These distances can be based on a
single dimension or multiple dimensions.
Euclidean distance. This is probably the most commonly chosen type of
distance. It simply is the geometric distance in the multidimensional space. It
is computed as:
Distance(x,y) = [å_{i} (x_{i} – y_{i})^{2}]^{1/2}
Squared Euclidean distance. One may want to square the standard Euclidean
distance in order to place progressively greater weight on objects that are
further apart. This distance is computed as:
Distance(x,y) = å_{i} (x_{i} – y_{i})^{2}
City-block (Manhattan) distance. This distance is simply the average
difference across dimensions. In most cases, this distance measure yields
results similar to the simple Euclidean distance. However, note that in this
measure, the effect of single large differences (outliers) is dampened (since
they are not squared). The city-block distance is computed as:
Distance(x,y) = å_{i} |x_{i} – y_{i}|
Chebychev distance. This distance measure may be appropriate in cases
when one wants to define two objects as "different" if they are different on any
one of the dimensions. The Chebychev distance is computed as:
Distance(x,y) = Maximum|x_{i} – y_{i}|
Power distance. Sometimes one may want to increase or decrease the
progressive weight that is placed on dimensions on which the respective objects
are very different. This can be accomplished via the power distance. The power
distance is computed as:
Distance(x,y) = (å_{i} |x_{i} – y_{i}|^{p})^{1/r
}
where r and p are user-defined parameters.
Percent disagreement. This measure is particularly useful if the data for
the dimensions included in the analysis are categorical in nature. This distance
is computed as:
Distance(x,y) = (Number of x_{i} ¹ y_{i})/ i