ML0022 Cross Entropy Loss

Explain how Cross Entropy Loss is used for a classification task.

Answer

Cross-entropy loss, also known as log loss or logistic loss, is a commonly used loss function in machine learning, particularly for classification tasks. It quantifies the difference between two probability distributions: the predicted probabilities generated by a model and the true probability distribution of the target variable. The goal of training a classification model is to minimize this loss.

For binary classification:
{\large \text{Binary Cross-Entropy Loss} = - \displaystyle\frac{1}{n}\sum_{i=1}^{n} [y_i \cdot \log(p_i) + (1-y_i) \cdot \log(1-p_i)]}
where:
{\large n} is the number of total samples.
{\large y_i} is the true label (0 or 1) for the i-th data point.
{\large p_i} is the predicted probability of the positive class (class 1) for the i-th data point.

For multi-class classification:
{\large \text{Categorical Cross-Entropy Loss} = - \displaystyle\frac{1}{n}\sum_{i=1}^{n} \sum_{j=1}^{C} y_{ij} \cdot \log(p_{ij})}
where:
{\large n} is the number of total samples.
{\large C} is the number of classes.
{\large y_{ij}}​ is a binary indicator (0 or 1) that is 1 if the true class for the i-th data point is j, and 0 otherwise (one-hot encoding).
{\large p_{ij}} is the predicted probability that the i-th data point belongs to class j.

The logarithm function in the formula penalizes incorrect predictions more severely when the model is more confident about that incorrect prediction.
For a true label of 1, the loss is higher when the predicted probability p is closer to 0, and lower when p is closer to 1.
For a true label of 0, the loss is higher when the predicted probability p is closer to 1, and lower when p is closer to 0.
The cross-entropy loss approaches 0 when the predicted probability distribution is close to the true distribution.

Key Properties:
Differentiable: The cross-entropy loss function is differentiable, which is essential for gradient-based optimization algorithms.
Sensitive to Confidence: It strongly penalizes confident but incorrect predictions.
Probabilistic Interpretation: It directly works with the predicted probabilities of the classes.


Login to view more content


Did you solve the problem?

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *