DL0004 Small Kernels

Written by

What are the key advantages of using small convolutional kernels, such as 3×3, over utilizing a few larger kernels in deep learning architectures?

Answer

Using small convolutional kernels instead of a few larger kernels offers several significant advantages in deep learning architectures:

(1) Deeper Networks & More Non-Linearity: Stacking multiple 3×3 layers (e.g., three 3×3 layers) allows for a deeper network with more non-linear activation functions compared to a single large kernel.
(2) Reduced Parameters: Multiple small kernels can achieve the same receptive field as a larger one, but with fewer parameters.
Example: Two stacked 3×3 layers ( $18 \cdot C_{in} \cdot C_{out}$ total parameters) have the same receptive field as a 5×5 layer ( $25 \cdot C_{in} \cdot C_{out}$ total parameters) but fewer parameters.

(3) Computational Efficiency: Fewer parameters in smaller kernels generally lead to lower computation costs during training and inference.
(4) Gradual Receptive Field Expansion: Successive 3×3 convolutions progressively build a larger receptive field while maintaining fine detail. (3×3 filters focus on local detail capture with pixel neighborhoods, ideal for textures or edges.)

Did you solve the problem?

DL0004 Small Kernels

Comments

Leave a Reply Cancel reply

More posts

MSD0007 Demand Forecasting System for Retailer

MSD0006 Video Recommendation System

MSD0005 Surveillance Video Anomaly Detection

DL0052 Rotary Positional Embedding