ML0054 KNN Classification

Written by

Please explain how KNN classification works.

Answer

K-Nearest Neighbors (KNN) is a simple, non-parametric algorithm that predicts a label by majority vote among the $K$ nearest neighbors of a test point, using a chosen distance metric. It is intuitive and effective for small datasets, though less efficient on large-scale data.
(1) Instance-based method: KNN doesn’t learn an explicit model; it stores the training data and makes predictions based on similarity.
(2) Distance-based classification: For a test point $\mathbf{x}$ , it computes the distance to every training point (e.g., Euclidean distance).
(3) Majority vote: It selects the $K$ closest neighbors and assigns the label that appears most frequently among them.
(4) Sensitive to K and distance metric: The performance depends on the choice of $K$ and how distance is measured (Euclidean, Manhattan, etc.).
(5) No training phase: All computation happens during prediction (also called lazy learning).

Below is the equation for Euclidean Distance calculation:
$\mbox{distance}(x, y) = \sqrt{\sum_{i=1}^{n} (x_i - y_i)^2}$
Where:
$x_i$ and $y_i$ are the $i$ th features of the new and training points, respectively.
$n$ is the number of features

Below is the equation for the Voting Rule in KNN classification:
$\hat{y} = \arg\max_{c \in \mathcal{C}} \sum_{i=1}^{K} \mathbb{1}(y_i = c)$
Where:
$\hat{y}$ is the predicted class label for the query point.
$\mathcal{C}$ represents the set of all possible classes.
$K$ is the number of nearest neighbors considered.
$y_i$ is the class label of the i-th neighbor.
$\mathbb{1}(y_i = c)$ is an indicator function, returning 1 if the neighbor’s class is $c$ , and 0 otherwise.