ML0025 Exploding Gradient

Written by

What are the typical reasons for exploding gradient?

Answer

Exploding gradients occur when the gradients during backpropagation become excessively large. This leads to huge updates in the model’s weights, making the training process unstable and potentially causing the model to diverge instead of converging.

Typical Reasons for Exploding Gradients:
1. Deep Architectures:
In very deep networks, repeatedly multiplying gradients (especially when derivatives are >1) can cause them to grow exponentially

2. Large Learning Rates:
When the learning rate is set too high, even moderately large gradients can result in weight updates that overshoot the optimum by a significant margin, compounding the instability.

3. Improper Weight Initialization:
If weights are initialized to values that are too high, activations and their corresponding derivatives can be disproportionately large. This imbalance not only disrupts the symmetry in learning but can also contribute to the accumulation of large gradient values.

4. Activation Functions with Derivatives Greater Than 1:
Some activation functions or their operating regimes can have derivatives greater than 1. Repeated multiplication of these large derivatives during backpropagation can lead to exponential growth of the gradients.
Scaled Exponential Linear Unit (SELU). For positive inputs, the SELU is defined as:
${\large \text{SELU}(x) = \lambda x,\quad \text{for } x>0}$
In the typical configuration for self-normalizing neural networks, the parameter ${\large\lambda}$ is set to approximately 1.0507 (greater than 1). This means in the positive regime, each layer effectively amplifies the gradient by a factor of ${\large\lambda}$ , which, when compounded over many layers, can contribute to exploding gradients if not properly managed.

Did you solve the problem?

Basics

ML0025 Exploding Gradient

Comments

Leave a Reply Cancel reply

More posts

MSD0007 Demand Forecasting System for Retailer

MSD0006 Video Recommendation System

MSD0005 Surveillance Video Anomaly Detection

DL0052 Rotary Positional Embedding