DL0046 Focal Loss

Written by

What is focal loss, and why does it help with class imbalance?

Answer

Focal loss augments cross-entropy with a modulating term $(1 - p_t)^\gamma$ and an optional balancing weight $\alpha_t$ to suppress gradients from easy, majority-class examples and amplify learning from hard or minority-class examples, improving performance in severe class-imbalance settings when hyperparameters are properly tuned.
(1) Focal loss formula:
$\text{FocalLoss}(p_t) = -\alpha_t (1 - p_t)^\gamma \log(p_t)$
Where:
$p_t$ is the model probability for the ground-truth class;
$\gamma \ge 0$ is the focusing parameter that down-weights easy examples;
$\alpha_t \in (0,1)$ is an optional class-balancing weight for class t.
(2) Modulation: The factor $(1 - p_t)^\gamma$ reduces loss from well-classified (high-confidence) examples, concentrating gradients on hard / low-confidence examples.

(3) Class imbalance effect:
In cross-entropy, abundant, easy negatives still produce a large total gradient, dominating learning.
Focal loss down-weights those contributions, ensuring rare/difficult samples have a stronger influence.