Hierarchical Cluster Analysis
Author: Dr. Hannah Volk-Jesussek
Updated:
A hierarchical cluster analysis is a clustering method that creates a hierarchical tree or dendrogram of the objects to be clustered.
The tree represents the relationships between objects and shows how objects are clustered at different levels.
Example Hierarchical Cluster Analysis
In our example we asked people about how many hours per week they spend on social media platforms and at the gym.
We now want to know if there are clusters in this dataset and perform a hierarchical cluster analysis.
How is a Hierarchical Cluster Analysis calculated?
First, we plot the points in a scatter plot.
With this, we can now start to create the clusters. In the first step we assign a cluster to each point. So we have as many clusters as we have people.
The goal is to merge clusters step by step until all points are in one cluster.
In each step, the clusters that are closest together are merged. What does "closest together" mean?
To do this, we need to determine two things:
- How the distance between two points is measured.
- How points in a cluster are connected.
Distance between two points
Let's start with the question of how to calculate the distance between two points. Here are the most common distance measures:
- Euclidean Distance
- Manhattan Distance
- Maximum Distance
Let's take the distance between Max and Caro. The difference on the y-axis is 1 and the difference on the x-axis is 4.
Euclidean Distance
The Euclidean distance is the square root of the sum of the squared differences.
Manhattan Distance
The Manhattan distance uses the sum of the absolute differences. So we simply calculate 4 plus 1, which gives a distance of 5.
Maximum Distance
The maximum distance is simply the maximum value of the absolute differences. In this case it is 4.
Linkage Methods
Now that we know how to calculate distances between points, we need to determine how to link the points within a cluster.
Let's say we have a cluster with the points Joe and Lisa and a cluster with Max and Caro. Now how do we determine the distance between these two clusters? Here are the most popular methods:
- Single linkage
- Complete linkage
- Average linkage
Single-linkage
Single linkage uses the distance between the closest elements in the clusters. This is the distance between Caro and Joe.
Complete-linkage
Complete linkage uses the distance between the farthest elements in the clusters. So between Max and Joe.
Average-linkage
Average linkage uses the average of all pairwise distances. The distance is calculated for each pair and then averaged.
Example: Hierarchical Cluster Analysis
For our example, we use the Euclidean distance and the single linkage method. So now we need the distance from each cluster to all other clusters.
For this we first need to calculate the distance matrix. In the distance matrix we enter the clusters on both dimensions and then calculate the distances from each cluster to every other cluster.
The distance between Alan and Lisa is given by:
We can now do this for all other combinations until we have calculated the full distance matrix. Now we can merge the first clusters by finding the smallest distance. This is the case between Joe and Lisa.
With this, we now combine Joe and Lisa into one cluster. In our tree diagram or dendrogram we can draw the first connection.
Now we need to update our distance matrix. We decided to use the single linkage method. So the distance between two clusters is given by the elements that are closest to each other. For the clusters Alan, Max, and Caro, the closest point to the Lisa and Joe cluster is Joe.
So we calculate the distance from Alan to Joe, the distance from Max to Joe, and the distance from Caro to Joe.
Now we again merge the clusters that are closest. These are Max and Alan.
In our tree diagram or dendrogram, we can draw the second connection.
Now we update the distance matrix again. We calculate the distance between Alan and Joe, Caro and Joe, and between Caro and Alan. The smallest distance is between the Caro cluster and the Lisa and Joe cluster.
So we connect these two clusters and draw the third connection in the tree diagram.
Now there are only two clusters left, and we merge them in the last step to obtain the finished dendrogram.
Calculate hierarchical cluster analysis with numiqo
Sample dataTo calculate a hierarchical cluster analysis online, just visit the statistics calculator and copy your own data into the table or use the link to load the dataset. Now we click on Cluster and select Hierarchical Cluster.
If we now click on Social Media and Gym, a hierarchical cluster analysis will be calculated for us. Additionally, we can specify the label, in our case the names of the people.
Now we can specify which linkage method should be used and how the distance should be calculated. We simply take single linkage and the Euclidean distance again.
Now we get the results below: the tree plot, a scatter plot, and the elbow plot. In the elbow plot we can choose how many clusters to take. We can see a kink here, so we take 4 clusters. We can select this above, and in the tree plot the four clusters are highlighted in different colors.
Statistics made easy
- many illustrative examples
- ideal for exams and theses
- statistics made easy on 454 pages
- 6th revised edition (March 2025)
- Only 8.99 €
"Super simple written"
"It could not be simpler"
"So many helpful examples"