DL0013 Instance Normalization

Written by

Can you explain what Instance Normalization is in the context of deep learning?

Answer

Instance Normalization (IN) normalizes each individual data sample (often per channel) by subtracting its own mean and dividing by its variance, then applying a scale and shift. This makes it ideal for applications where per-instance adjustment is needed, such as artistic style transfer, ensuring that the normalization is not affected by the mini-batch composition.

Here are the equations for calculating Instance Normalization output $y_{nchw}$ for input $x_{nchw}$ :

$\mu_{nc} = \frac{1}{HW} \sum_{h=1}^{H} \sum_{w=1}^{W} x_{nchw}$
$\sigma_{nc}^2 = \frac{1}{HW} \sum_{h=1}^{H} \sum_{w=1}^{W} (x_{nchw} - \mu_{nc})^2$
$\hat{x}_{nchw}=\frac{{x}_{nchw} - \mu_{nc}}{\sqrt{\sigma_{nc}^2 + \epsilon}}$
$y_{nchw} = \gamma_c \hat{x}_{nchw} + \beta_c$
Where:
$x_{nchw}$ is the input feature at batch $n$ , channel $c$ , height $h$ , and width $w$ .
$H$ is the height of the feature map (number of rows per channel).
$W$ is the width of the feature map (number of columns per channel).
$\mu_{nc}$ is the mean of all spatial values in channel $c$ of instance $n$ .
$\sigma_{nc}^2$ is the variance of spatial values in channel $c$ of instance $n$ .
$\hat{x}_{nchw}$ is the normalized value after subtracting the mean and dividing by the standard deviation.
$\epsilon$ is a small constant added to the denominator to prevent division by zero and improve numerical stability.
$y_{nchw}$ is the final output after applying normalization and scaling.
$\gamma_c$ is a learnable scale parameter for channel $c$ .
$\beta_c$ is a learnable shift parameter for channel $c$ .