Hierarchical Clustering

Hierarchical Clustering:

Hierarchical clustering is a type of unsupervised machine learning algorithm used to cluster unlabeled data points. Like k-means clustering hierarchical also groups together the data points with similar characteristics. In some cases the result of hierarchical and k-means clustering can be similar.

Types of Hierarchical clustering:

There are two types of Hierarchical clustering

  1. Agglomerative
  2. Divisive

In the former, data points are clustered using a bottom-up approach starting with individual data points,While in the latter top-down approach is followed where all the data points are treated as one big cluster and the clustering process involves dividing the one big cluster into several small clusters.

Steps to perform Hierarchical clustering:

Fallowing are the steps involved in agglomerative clustering

  1. At the start, treat each data point as one cluster. Therefore, the number of clusters at the start will be 'K', while 'K' is an integer representing the number of data points.
  2. Form a cluster by joining the two closest data points resulting in "K-1" clusters.
  3. Form more clusters by joining the two closest clusters resulting in "K-2" clusters.
  4. Repeat the above three steps until one big cluster is formed.
  5. Once single cluster is formed, dendograms are used to divide into multiple clusters depending upon the problem. 

There are different ways to find distance between the clusters. The distance itself can be Euclidean or Manhattan distance.

  • Measure the distance between the closes points of two clusters.
  • Measure the distance between the farthest points of two clusters.
  • Measure the distance between the centroids of two clusters.
  • Measure the distance between all possible combination of points between the two clusters and take the mean.
Role of  Dendograms for Hierarchical clustering:
We know that, once one large cluster is formed by the combination of small clusters, dendrograms of the cluster are used to actually split the cluster into multiple clusters of related data points.



Comments