What are the typical reasons for exploding gradient?
Answer
Exploding gradients occur when the gradients during backpropagation become excessively large. This leads to huge updates in the model’s weights, making the training process unstable and potentially causing the model to diverge instead of converging.
Typical Reasons for Exploding Gradients:
1. Deep Architectures:
In very deep networks, repeatedly multiplying gradients (especially when derivatives are >1) can cause them to grow exponentially
2. Large Learning Rates:
When the learning rate is set too high, even moderately large gradients can result in weight updates that overshoot the optimum by a significant margin, compounding the instability.
3. Improper Weight Initialization:
If weights are initialized to values that are too high, activations and their corresponding derivatives can be disproportionately large. This imbalance not only disrupts the symmetry in learning but can also contribute to the accumulation of large gradient values.
4. Activation Functions with Derivatives Greater Than 1:
Some activation functions or their operating regimes can have derivatives greater than 1. Repeated multiplication of these large derivatives during backpropagation can lead to exponential growth of the gradients.
Scaled Exponential Linear Unit (SELU). For positive inputs, the SELU is defined as:
In the typical configuration for self-normalizing neural networks, the parameter is set to approximately 1.0507 (greater than 1). This means in the positive regime, each layer effectively amplifies the gradient by a factor of
, which, when compounded over many layers, can contribute to exploding gradients if not properly managed.
Leave a Reply