ML0050 Logistic Regression III

Why is Mean Squared Error (L2 Loss) an unsuitable loss function for logistic regression compared to cross-entropy?

Answer

Mean Squared Error (MSE) is unsuitable for logistic regression primarily because, when combined with the sigmoid function, it can lead to a non-convex loss landscape, making optimization harder and increasing the risk of poor convergence. Additionally, it provides weaker gradients when predictions are confidently incorrect, slowing down learning. Cross-entropy loss is better suited as it aligns with the Bernoulli distribution assumption, produces stronger gradients, and leads to a well-behaved convex loss for a single neuron binary classification setting.

(1) Wrong Assumption: MSE assumes a Gaussian distribution of errors, while logistic regression assumes a Bernoulli (binary) distribution.
(2) Non-convex Optimization: MSE with sigmoid can create a non-convex loss surface, making optimization harder and less stable.
(3) Gradient Issues: MSE leads to smaller gradients for confident wrong predictions, slowing down learning compared to cross-entropy.
(4) Interpretation: Cross-entropy directly compares predicted probabilities to true labels, which is more appropriate for classification.

The figure below shows the non-convex loss surface when MSE is used for logistic regression.


Login to view more content

Did you solve the problem?

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *