DL0016 Learning Rate Warmup

Written by

What is Learning Rate Warmup? What is the purpose of using Learning Rate Warmup?

Answer

Learning Rate Warmup is a training technique where the learning rate starts from a small value and gradually increases to a target (base) learning rate over the first few steps or epochs of training.

Purpose of Using Learning Rate Warmup:
(1) Stabilizes Early Training: At the beginning of training, weights are randomly initialized, making the model sensitive to large updates. A warmup gradually increases the learning rate, preventing unstable behavior.
(2) Allow Optimizers to Adapt: Optimizers like Adam and AdamW rely on gradient statistics that can be unstable at the start. Warmup allows these optimizers to accumulate more accurate estimates before using a high learning rate.
(3) Enables Large Batch Training: Mitigates issues that can arise when combining a large batch size with a high initial learning rate.

Below shows an example using Learning Warmup followed by Cosine Decay.

Did you solve the problem?

Basics

DL0016 Learning Rate Warmup

Comments

Leave a Reply Cancel reply

More posts

MSD0007 Demand Forecasting System for Retailer

MSD0006 Video Recommendation System

MSD0005 Surveillance Video Anomaly Detection

DL0052 Rotary Positional Embedding