ML0057 K-means

Written by

Please explain how K-means works.

Answer

K-means is an iterative unsupervised algorithm that groups data into $K$ clusters by minimizing intra-cluster distances. It alternates between assigning points to the nearest centroid and updating centroids until convergence. It is fast and easy to implement, but sensitive to initialization and non-convex cluster shapes.

Goal of K-means: Partition data into $K$ clusters by minimizing within-cluster variance.

K-means Steps:
(1) Initialization: Randomly choose $K$ centroids.
(2) Assignment step: Assign each point to the closest centroid using Euclidean distance, given by:
$d(x, c_k) = \sqrt{\sum_{i=1}^{n} (x_i - c_{k,i})^2}$
Where:
$x$ is the data point.
$c_k$ is the cluster center.
(3) Update Step: Compute new cluster centers as the mean of all points assigned to that cluster by:
$c_k = \frac{1}{C_k} \sum_{x \in C_k} x$
Where:
$C_k$ represents the set of points assigned to cluster $k$ .
(4) Convergence: Repeat assignment and update steps until cluster centers stabilize or a stopping criterion is met.