ML0052 Non-Linear SVM

Written by

Can you explain the concept of a non-linear Support Vector Machine (SVM)?

Answer

A non-linear SVM allows classification of data that isn’t linearly separable by using a kernel function to project the data into a higher-dimensional space implicitly. This approach, known as the kernel trick, provides flexibility in handling complex datasets while maintaining computational efficiency. The choice of kernel, such as RBF, polynomial, or sigmoid, can greatly influence the performance and adaptability of the model.

Kernel Trick: Converts input data into a higher-dimensional space where a linear separation is possible, even if the original data is non-linearly separable.

Common Kernels:
Polynomial Kernel:
Uses polynomial functions of the input features to capture non-linear patterns in the data.

$K(\mathbf{x}_i, \mathbf{x}_j) = (\gamma \mathbf{x}_i^\top \mathbf{x}_j + c)^d$
Where:
$\mathbf{x}_i, \mathbf{x}_j$ are input vectors.
$\gamma$ controls the scale of the inner product.
$c$ is a constant that controls the influence of higher-order terms.
$d$ is the degree of the polynomial.

Radial Basis Function (RBF) Kernel:
Measures local similarity based on the Euclidean distance between points; nearby points have higher similarity.
$K(\mathbf{x}_i, \mathbf{x}_j) = \exp\left(-\gamma \|\mathbf{x}_i - \mathbf{x}_j\|^2\right)$
Where:
$\mathbf{x}_i, \mathbf{x}_j$ are input vectors.
$\|\mathbf{x}_i - \mathbf{x}_j\|^2$ is the squared Euclidean distance between the vectors.
$\gamma$ controls the scale of the inner product.
$\sigma$ is a parameter that controls the width of the Gaussian (spread).

Sigmoid Kernel:
Imitates neural activation by applying a tanh function to the dot product of inputs, introducing non-linearity.
$K(\mathbf{x}_i, \mathbf{x}_j) = \tanh(\gamma \mathbf{x}_i^\top \mathbf{x}_j + c)$
Where:
$\mathbf{x}_i, \mathbf{x}_j$ are input vectors.
$\gamma$ controls the scale of the inner product.
$c$ is a bias term.

Objective: Determine an optimal hyperplane in the transformed space that maximizes the margin between classes, effectively improving classification performance.

$\max_{\boldsymbol{\alpha}} \sum_{i=1}^{n} \alpha_i - \frac{1}{2} \sum_{i=1}^{n} \sum_{j=1}^{n} \alpha_i \alpha_j y_i y_j K(\mathbf{x}_i, \mathbf{x}_j)$

The example below compares a Linear Support Vector Machine with a Non-Linear Support Vector Machine.

Did you solve the problem?

SVM

ML0052 Non-Linear SVM

Comments

Leave a Reply Cancel reply

More posts

MSD0007 Demand Forecasting System for Retailer

MSD0006 Video Recommendation System

MSD0005 Surveillance Video Anomaly Detection

DL0052 Rotary Positional Embedding