Interview for Machine Learning

Tag: CNN

DL0024 Fixed-size Input in CNN
What is the “dilemma of fixed-size input” for CNNs? How is it typically resolved?
Answer
The “dilemma of fixed-size input” for Convolutional Neural Networks (CNNs) refers to the requirement that traditional CNN architectures demand input images of a predetermined, fixed size. This presents a challenge because real-world images often vary widely in dimensions.
Fixed Input Requirement: Traditional CNN architectures (like VGG or ResNet) require inputs of a fixed size due to the structure of fully connected layers at the end.
Data Preprocessing Constraint: Real-world images vary in size, so they must be resized or cropped, which may distort or lose important features.
Inefficiency & Information Loss: Resizing may stretch or compress content unnaturally, affecting model performance.
Below shows an example of information loss during resizing or cropping.

Common Solutions for the dilemma of fixed-size input:
(1) Global Average Pooling (GAP): Replaces fully connected layers, allowing input of variable size and reducing overfitting.
(2) Fully Convolutional Networks (FCNs): Use only convolutional and pooling layers, which can handle variable-sized inputs.
(3) Adaptive Pooling (e.g., in PyTorch): Pools features to a fixed size regardless of input dimensions.
Login to view more content
June 8, 2025
DL0023 Dilated Convolution
What are dilated convolutions? When would you use them?
Answer
Dilated convolutions enhance standard convolution by inserting gaps between filter elements, thereby allowing the network to gather more context (a larger receptive field) without an increase in parameters or a reduction in resolution.
Dilated convolutions (also known as atrous convolutions) modify standard convolution by inserting gaps (zeros) between kernel elements. A “dilation rate” dictates the spacing of these gaps. A dilation rate of 1 is a standard convolution.
Contrast with Pooling:
Pooling reduces spatial resolution (downsamples) while increasing the receptive field.
Dilated convolutions increase the receptive field without reducing resolution.
Multi-Scale Feature Extraction:
By adjusting the dilation rate, these convolutions can aggregate features from both local neighborhoods and larger regions, making it easier for the network to learn from multi-scale context.
Common Use Cases: Any task needing large receptive fields without downsampling.
(1) Semantic segmentation (e.g., DeepLab): Expand the receptive field and capture multi-scale context.
(2) Audio processing (e.g., WaveNet): Model long-range temporal dependencies.
Here is a 1D Dilated Convolution illustration.
Here is a 2D Dilated Convolution illustration.
Login to view more content
May 31, 2025
DL0022 CNN Architecture
Describe the typical architecture of a CNN.
Answer
A Convolutional Neural Network (CNN) is structured to efficiently recognize complex patterns in data. It begins with an input layer that feeds in raw data. Convolutional layers then extract key features using filters, which are enhanced through non-linear activation functions like ReLU. Pooling layers are used to reduce the size or dimensions of these features, thereby improving computational efficiency and promoting invariance to small shifts. The extracted features are flattened and passed through fully connected layers that culminate in an output layer for final predictions, typically employing a softmax function for classification tasks. Optional techniques, such as dropout and batch normalization, further refine learning and help prevent overfitting.
(1) Input Layer: Accepts raw data as multi-dimensional arrays.
(2) Convolutional Layers: Use learnable filters (kernels) to scan the input and extract local features.
(3) Activation Functions: Apply non-linearity (commonly ReLU) after each convolution operation.
(4) Pooling Layers: Downsample feature maps using techniques like max or average pooling to reduce spatial dimensions and computations.
(5) Stacked Convolutional and Pooling Blocks: Multiple iterations to progressively extract intricate hierarchical features.
(6) Flattening: Converts feature maps into one-dimensional vectors.
(7) Fully Connected Layers: Learn complex patterns and perform decision-making.
(8) Output Layer: Produces final predictions using appropriate activation functions (e.g., softmax for classification)
(9) Additional Components (Optional): Dropout for regularization, batch normalization for training stability, and skip connections in more advanced models.
Below is a visual representation of a typical CNN architecture. Padding is used in convolution to maintain dimensions.
Login to view more content
May 31, 2025
DL0021 Feature Map
What is the feature map in Convolutional Neural Networks?
Answer
A feature map is the output of a convolution operation in a Convolutional Neural Network (CNN) that highlights where specific features appear in the input, enabling the network to understand patterns and structures in input data.
Feature Map in CNNs:
(1) Output of a Filter: It’s the 2D (or 3D) output generated when a single convolutional filter slides across the input data.
(2) Highlighting a Specific Feature: Each feature map represents the spatial locations and strengths where a particular pattern or characteristic (e.g., a vertical edge, a specific texture, a corner) is detected in the input.
(3) Multiple Feature Maps per Layer: A convolutional layer typically uses multiple filters, with each filter producing its unique feature map.
The following example shows feature map examples calculated with different filters on the original image.
Login to view more content
May 31, 2025
DL0020 CNN Parameter Sharing
How do Convolutional Neural Networks achieve parameter sharing? Why is it beneficial?
Answer
Convolutional Neural Networks (CNNs) share parameters by using the same convolutional filter across different spatial locations, enabling them to learn location-independent features efficiently with fewer parameters and better generalization.
How CNNs Achieve Parameter Sharing:
(1) Convolutional Filters/Kernels: A small matrix of learnable weights (the filter) is defined.
(2) Sliding Window Operation: This filter slides across the entire input image (or feature map).
(3) Weight Reuse: The same weights within that filter are used to compute outputs at every spatial location where the filter is applied.
Why Parameter Sharing is Beneficial:
(1) Reduced Parameters: Significantly fewer learnable parameters compared to fully connected networks.
(2) Translation equivariance: Detects features regardless of their position in the image.
The following example demonstrates translation equivariance using a CNN-like convolution with a shared filter.
(3) Improved Generalization: Less prone to overfitting due to fewer parameters.
(4) Computational Efficiency: Faster training and inference.
Login to view more content
May 30, 2025