DL0002 All Ones Init

Written by

What are the potential consequences of initializing all weights to one in a deep learning model?

Answer

Below are the key consequences of initializing all weights in a deep-learning model to one (a constant non-zero value), illustrating why random, scaled initializations (e.g., Xavier/He) are essential.
(1) Symmetry Problem: Neurons receive identical gradients, causing them to learn the same features rather than developing distinct representations.
(2) Limited Representational Capacity: The network cannot capture complex, varied patterns because all neurons behave identically.
(3) Slow/No Convergence: The lack of Representational Capacity further makes it difficult for the model to update to the optimal weights. (The below image shows an example for training loss comparison for ones initialization vs random initialization)

(4) Activation Saturation: Can push neurons into saturated regions of activation functions (e.g., sigmoid, tanh), leading to vanishing gradients.

Did you solve the problem?

Basics NN

DL0002 All Ones Init

Comments

Leave a Reply Cancel reply

More posts

MSD0007 Demand Forecasting System for Retailer

MSD0006 Video Recommendation System

MSD0005 Surveillance Video Anomaly Detection

DL0052 Rotary Positional Embedding