What are the benefits of using 1×1 convolutional layers in deep learning architectures?
Answer
A 1×1 convolution, also known as a pointwise convolution, is a convolutional operation where the kernel size is 1×1, which plays several crucial roles in deep learning architectures.
(1) Dimensionality control: 1×1 convolution can reduce or expand the number of feature maps, trading off representational capacity and computational cost.
For example, Bottleneck designs: In architectures like ResNet’s bottleneck block, a 1×1 conv first reduces channels (e.g., 256→64), then a 3×3 conv processes those, and finally another 1×1 conv expands back (64→256) to restore capacity while keeping compute manageable.
(2) Increased Network Depth with Controlled Cost: Allows for the design of deeper networks by reducing channel dimensionality before computationally expensive spatial convolutions.
(3) Cross-Channel Feature Fusion: Enables interaction and combination of information across different feature channels at the same spatial location.
(4) Non-linear mixing: When followed by activations (ReLU, etc.), 1×1 convolutions introduce non‐linear channel mixing that enhances model expressiveness.

