Interview for Machine Learning

Tag: SVM

ML0053 Hinge Loss for SVM
Explain the Hinge Loss function used in SVM.
Answer
The Hinge Loss function is a key element in Support Vector Machines that penalizes both misclassified points and correctly classified points that lie within the decision margin. It assigns zero loss to points that are correctly classified and lie outside or exactly on the margin, and applies a linearly increasing loss as points move closer to or across the decision boundary. This loss structure encourages the SVM to maximize the margin between classes, promoting robust and generalizable decision boundaries.
The Hinge Loss is defined as follows.
$\text{Hinge Loss} = \max(0,\ 1 - y \cdot f(\mathbf{x}))$
Where:
$y \in {-1, +1}$ is the true label,
$f(\mathbf{x})$ is the raw model output.
Hinge Loss is plotted in the figure below.
Zero Loss: When $y \cdot f(\mathbf{x}) \ge 1$ , meaning the point is correctly classified with margin.
Positive Loss: When $y \cdot f(\mathbf{x}) < 1$ , the point is either inside the margin or misclassified.
Login to view more content
June 11, 2025
ML0052 Non-Linear SVM
Can you explain the concept of a non-linear Support Vector Machine (SVM)?
Answer
A non-linear SVM allows classification of data that isn’t linearly separable by using a kernel function to project the data into a higher-dimensional space implicitly. This approach, known as the kernel trick, provides flexibility in handling complex datasets while maintaining computational efficiency. The choice of kernel, such as RBF, polynomial, or sigmoid, can greatly influence the performance and adaptability of the model.
Kernel Trick: Converts input data into a higher-dimensional space where a linear separation is possible, even if the original data is non-linearly separable.
Common Kernels:
Polynomial Kernel:
Uses polynomial functions of the input features to capture non-linear patterns in the data.
$K(\mathbf{x}_i, \mathbf{x}_j) = (\gamma \mathbf{x}_i^\top \mathbf{x}_j + c)^d$
Where:
$\mathbf{x}_i, \mathbf{x}_j$ are input vectors.
$\gamma$ controls the scale of the inner product.
$c$ is a constant that controls the influence of higher-order terms.
$d$ is the degree of the polynomial.
Radial Basis Function (RBF) Kernel:
Measures local similarity based on the Euclidean distance between points; nearby points have higher similarity.
$K(\mathbf{x}_i, \mathbf{x}_j) = \exp\left(-\gamma \|\mathbf{x}_i - \mathbf{x}_j\|^2\right)$
Where:
$\mathbf{x}_i, \mathbf{x}_j$ are input vectors.
$\|\mathbf{x}_i - \mathbf{x}_j\|^2$ is the squared Euclidean distance between the vectors.
$\gamma$ controls the scale of the inner product.
$\sigma$ is a parameter that controls the width of the Gaussian (spread).
Sigmoid Kernel:
Imitates neural activation by applying a tanh function to the dot product of inputs, introducing non-linearity.
$K(\mathbf{x}_i, \mathbf{x}_j) = \tanh(\gamma \mathbf{x}_i^\top \mathbf{x}_j + c)$
Where:
$\mathbf{x}_i, \mathbf{x}_j$ are input vectors.
$\gamma$ controls the scale of the inner product.
$c$ is a bias term.
Objective: Determine an optimal hyperplane in the transformed space that maximizes the margin between classes, effectively improving classification performance.
$\max_{\boldsymbol{\alpha}} \sum_{i=1}^{n} \alpha_i - \frac{1}{2} \sum_{i=1}^{n} \sum_{j=1}^{n} \alpha_i \alpha_j y_i y_j K(\mathbf{x}_i, \mathbf{x}_j)$
The example below compares a Linear Support Vector Machine with a Non-Linear Support Vector Machine.
Login to view more content
June 11, 2025
ML0051 Linear SVM
Can you explain the key concepts behind a Linear Support Vector Machine?
Answer
A Linear Support Vector Machine (Linear SVM) is a classifier that finds the optimal straight-line (or hyperplane) separating two classes by maximizing the margin between them. It relies on a few critical points (support vectors) and offers strong generalization, especially for linearly separable data.
Key Concepts of a Linear Support Vector Machine:
(1) Hyperplane: A decision boundary that separates data points of different classes.
(2) Margin: The distance between the hyperplane and the nearest data points from each class.
(3) Support Vectors: Data points that lie closest to the hyperplane and define the margin.
(4) Objective: Maximize the margin while minimizing classification errors.
Here is the Linear SVM Decision Function:
$f(\mathbf{x}) = \mathbf{w}^\top \mathbf{x} + b$
Where:
$\mathbf{x}$ is the input feature vector.
$\mathbf{w}$ is the weight vector.
$b$ is the bias term.

Here is the Linear SVM Classification Rule:
$\hat{y} = \mbox{sign}(\mathbf{w}^\top \mathbf{x} + b) = \mbox{sign}(f(\mathbf{x}))$
Where:
$\hat{y}$ is the predicted class label.
$\mbox{sign}(\cdot)$ returns +1 if the argument is ≥ 0, and −1 otherwise.
For Hard Margin SVM, here is the Optimization Objective:
$\min_{\mathbf{w}, b} \quad \frac{1}{2} \|\mathbf{w}\|^2$
Subject to:
$y_i(\mathbf{w}^\top \mathbf{x}_i + b) \geq 1 \quad \text{for all } i$
Where:
$y_i \in {-1, 1}$ is the class label for the i-th data point.
$\mathbf{x}_i$ is the i-th feature vector.
The example below shows Hard Margin SVM for solving a classification task.
Login to view more content
June 11, 2025