ML0022 Cross Entropy Loss

Written by

Explain how Cross Entropy Loss is used for a classification task.

Answer

Cross-entropy loss, also known as log loss or logistic loss, is a commonly used loss function in machine learning, particularly for classification tasks. It quantifies the difference between two probability distributions: the predicted probabilities generated by a model and the true probability distribution of the target variable. The goal of training a classification model is to minimize this loss.

For binary classification:
${\large \text{Binary Cross-Entropy Loss} = - \displaystyle\frac{1}{n}\sum_{i=1}^{n} [y_i \cdot \log(p_i) + (1-y_i) \cdot \log(1-p_i)]}$
where:
${\large n}$ is the number of total samples.
${\large y_i}$ is the true label (0 or 1) for the i-th data point.
${\large p_i}$ is the predicted probability of the positive class (class 1) for the i-th data point.

For multi-class classification:
${\large \text{Categorical Cross-Entropy Loss} = - \displaystyle\frac{1}{n}\sum_{i=1}^{n} \sum_{j=1}^{C} y_{ij} \cdot \log(p_{ij})}$
where:
${\large n}$ is the number of total samples.
${\large C}$ is the number of classes.
${\large y_{ij}}$ is a binary indicator (0 or 1) that is 1 if the true class for the i-th data point is j, and 0 otherwise (one-hot encoding).
${\large p_{ij}}$ is the predicted probability that the i-th data point belongs to class j.

The logarithm function in the formula penalizes incorrect predictions more severely when the model is more confident about that incorrect prediction.
For a true label of 1, the loss is higher when the predicted probability p is closer to 0, and lower when p is closer to 1.
For a true label of 0, the loss is higher when the predicted probability p is closer to 1, and lower when p is closer to 0.
The cross-entropy loss approaches 0 when the predicted probability distribution is close to the true distribution.

Key Properties:
Differentiable: The cross-entropy loss function is differentiable, which is essential for gradient-based optimization algorithms.
Sensitive to Confidence: It strongly penalizes confident but incorrect predictions.
Probabilistic Interpretation: It directly works with the predicted probabilities of the classes.

Did you solve the problem?

Loss

ML0022 Cross Entropy Loss

Comments

Leave a Reply Cancel reply

More posts

MSD0007 Demand Forecasting System for Retailer

MSD0006 Video Recommendation System

MSD0005 Surveillance Video Anomaly Detection

DL0052 Rotary Positional Embedding