Deep Learning Interview Questions Table
| ID▼ | Title▼ | Content | Tags | Categories |
|---|---|---|---|---|
| 1332 | DL0052 Rotary Positional Embedding | What is Rotary Positional Embeddin… | Transformer | Medium |
| 1310 | DL0051 Sparsity in NN | Explain the concept of "Sparsity" … | NN | Medium |
| 1229 | DL0050 Knowledge Distillation | Describe the process and benefits … | Basics | Medium |
| 1213 | DL0049 Weight Init | Why is "weight initialization" imp… | Basics | Easy |
| 1205 | DL0048 Adam Optimizer | Can you explain how the Adam optim… | Basics | Easy |
| 1197 | DL0047 Focal Loss II | Please compare focal loss and weig… | Loss | Medium |
| 1185 | DL0046 Focal Loss | What is focal loss, and why does i… | Loss | Easy |
| 1172 | DL0045 Dimension in FFN | In Transformers, why does the feed… | Transformer | Medium |
| 1162 | DL0044 Multi-Query Attention | What is Multi-Query Attention in t… | Transformer | Medium |
| 1152 | DL0043 KV Cache | What is KV Cache in transformers, … | Transformer | Easy |
| 1145 | DL0042 Attention Computation | Please break down the computationa… | Transformer | Medium |
| 1125 | DL0041 Hierarchical Attention | Could you explain the concept of h… | Transformer | Medium |
| 1121 | DL0040 Attention Mask | What is the role of masking in att… | Transformer | Easy |
| 1114 | DL0039 Transformer Weight Tying | Explain weight sharing in Transfor… | Transformer | Hard |
| 1109 | DL0038 Transformer Activation | Which activation functions do tran… | Transformer | Easy |
| 1103 | DL0037 Transformer Architecture III | Why do Transformers use a dot prod… | Transformer | Medium |
| 1097 | DL0036 Transformer Architecture II | What are the main differences betw… | Transformer | Easy |
| 1083 | DL0035 Transformer Architecture | Describe the original Transformer … | Transformer | Easy |
| 1077 | DL0034 Layer Norm | What is layer normalization, and w… | Norm Transformer | Easy |
| 1065 | DL0033 Transformer Computation | In a Transformer architecture, whi… | Transformer | Hard |
| 1055 | DL0032 Transformer VS RNN | What makes Transformers more paral… | RNN Transformer | Easy |
| 1049 | DL0031 FFN in Transformer | What is the purpose of the feed-fo… | Transformer | Easy |
| 1044 | DL0030 Positional Encoding | Explain "Positional Encoding" in T… | Transformer | Easy |
| 1024 | DL0029 Dilated Attention | Could you explain the concept of d… | Transformer | Medium |
| 1012 | DL0028 Sliding Window Attention | Explain the sliding window attenti… | Transformer | Medium |
| 1002 | DL0027 Multi-Head Attention | How does multi-head attention work… | Transformer | Easy |
| 996 | DL0026 Self-Attention vs Cross-Attention | What distinguishes self-attention … | Transformer | Easy |
| 948 | DL0025 Attention Mechanism | Please explain the concept of "Att… | Transformer | Easy |
| 795 | DL0024 Fixed-size Input in CNN | What is the "dilemma of fixed-size… | CNN | Medium |
| 787 | DL0023 Dilated Convolution | What are dilated convolutions? Whe… | CNN | Medium |
| 783 | DL0022 CNN Architecture | Describe the typical architecture … | CNN | Easy |
| 779 | DL0021 Feature Map | What is the feature map in Convolu… | CNN | Easy |
| 775 | DL0020 CNN Parameter Sharing | How do Convolutional Neural Networ… | CNN | Easy |
| 763 | DL0019 Go Deep | How does increasing network depth … | NN | Medium |
| 761 | DL0018 NaN Values | What are the common causes for a d… | Basics | Medium |
| 757 | DL0017 Reproducibility | How to ensure the reproducibility … | Basics | Easy |
| 752 | DL0016 Learning Rate Warmup | What is Learning Rate Warmup? What… | Basics | Easy |
| 719 | DL0015 Cold Start | What is a "cold start" problem in … | Basics | Medium |
| 686 | DL0014 Mixed Precision Training | Can you explain the primary benefi… | Basics | Medium |
| 674 | DL0013 Instance Normalization | Can you explain what Instance Norm… | Norm | Medium |
| 652 | DL0012 Zero Padding | Why is zero padding used in deep l… | Basics NN | Medium |
| 618 | DL0011 Fully Connected Layer | Can you explain what a fully conne… | Basics NN | Easy |
| 576 | DL0010 Receptive Field | What is the receptive field in con… | Basics NN | Medium |
| 570 | DL0009 Pooling | Please compare max pooling and ave… | Basics | Easy |
| 567 | DL0008 Hyperparameter Tuning | What are the common strategies for… | Basics | Easy |
| 563 | DL0007 Batch Norm | Why use batch normalization in dee… | Norm | Easy |
| 560 | DL0006 Layer Freeze in TL | What are the common strategies for… | Basics | Easy |
| 557 | DL0005 Transfer Learning | Why use transfer learning in deep … | Basics | Easy |
| 549 | DL0004 Small Kernels | What are the key advantages of usi… | NN | Easy |
| 542 | DL0003 1×1 Convolution | What are the benefits of using 1×1… | NN | Medium |
| 532 | DL0002 All Ones Init | What are the potential consequence… | Basics NN | Medium |
| 462 | DL0001 Residual Connection | Why are residual connections impor… | Basics NN | Medium |