Please explain how K-means works.
Answer
K-means is an iterative unsupervised algorithm that groups data into clusters by minimizing intra-cluster distances. It alternates between assigning points to the nearest centroid and updating centroids until convergence. It is fast and easy to implement, but sensitive to initialization and non-convex cluster shapes.
Goal of K-means: Partition data into clusters by minimizing within-cluster variance.
K-means Steps:
(1) Initialization: Randomly choose centroids.
(2) Assignment step: Assign each point to the closest centroid using Euclidean distance, given by:
Where: is the data point.
is the cluster center.
(3) Update Step: Compute new cluster centers as the mean of all points assigned to that cluster by:
Where: represents the set of points assigned to cluster
.
(4) Convergence: Repeat assignment and update steps until cluster centers stabilize or a stopping criterion is met.
Below shows an example for K-means clustering.
Leave a Reply