Big COHA logo
           
COHA Overview What's new on COHA COHA tour COHA site map Contact COHA staff
 
   Analysis by state
   Analysis by class I
   Analysis by region
   Descriptive maps
   Backtrajectories
   Trends analysis
   PMF Modeling
   Episode analysis
   Dust analysis
   Tribal analysis
   Fish & Wildlife
   Terms & acronyms
For best results,
please use:
Internet Explorer 5
(or higher)
Netscape 6
(or higher)
more info...
Cluster Analysis
 
 

Distance Measures

The joining or tree clustering method uses the dissimilarities or distances between objects when forming the clusters. These distances can be based on a single dimension or multiple dimensions.

Euclidean distance. This is probably the most commonly chosen type of distance. It simply is the geometric distance in the multidimensional space. It is computed as:

Distance(x,y) = [i (xi yi)2]1/2

Squared Euclidean distance. One may want to square the standard Euclidean distance in order to place progressively greater weight on objects that are further apart. This distance is computed as:

Distance(x,y) = i (xi yi)2

City-block (Manhattan) distance. This distance is simply the average difference across dimensions. In most cases, this distance measure yields results similar to the simple Euclidean distance. However, note that in this measure, the effect of single large differences (outliers) is dampened (since they are not squared). The city-block distance is computed as:

Distance(x,y) = i |xi yi|

Chebychev distance. This distance measure may be appropriate in cases when one wants to define two objects as "different" if they are different on any one of the dimensions. The Chebychev distance is computed as:

Distance(x,y) = Maximum|xi yi|

Power distance. Sometimes one may want to increase or decrease the progressive weight that is placed on dimensions on which the respective objects are very different. This can be accomplished via the power distance. The power distance is computed as:

Distance(x,y) = (i |xi yi|p)1/r
where r and p are user-defined parameters.


Percent disagreement. This measure is particularly useful if the data for the dimensions included in the analysis are categorical in nature. This distance is computed as:

Distance(x,y) = (Number of xi yi)/ i