Interview for Machine Learning

Tag: System

MSD0007 Demand Forecasting System for Retailer
Design a demand forecasting system for a large retail company like Costco, Walmart, or Target. The system should predict future product demand across stores and time to support inventory planning, replenishment, and promotions.
Answer
The demand forecasting system ingests diverse data from sales, inventory, weather, and promotions to predict product demand using ML models like time series, tree-based, or deep learning methods.
This demand forecasting system features a scalable architecture with data pipelines, real-time processing, and integration for inventory management.
Key benefits include reducing stockouts, optimizing supply chains, and improving accuracy through iterative model training.
Problem Definition & Success Metrics:
Define the forecast granularity (e.g., SKU-store-day), horizon (e.g., 2-week operational, 3-month tactical), and objective (e.g., minimize out-of-stocks and waste).
Key success metrics would be Weighted Mean Absolute Percentage Error (WMAPE) for overall accuracy and forecast bias to detect systematic over/under-prediction.
Data Strategy & Feature Engineering:
Integrate diverse data sources into a unified feature store:
(1) Internal: Historical sales, product hierarchies, pricing, promotional calendars, inventory levels, and online search/click data.
(2) External: Calendar events (holidays, paydays), weather, local events, competitor activity (scraped), and macroeconomic trends.
System Architecture:
(1) Data Ingestion Layer: Batch and real-time streams.
(2) Processing & Feature Store: Clean, validate, and compute features.
(3) Modeling Layer: A repository for multiple models, allowing experimentation.
(4) Serving Layer: Exposes forecasts via APIs to downstream systems (replenishment, pricing).
(5) Monitoring & Feedback: Tracks model performance, data drift, and incorporates actual sales as ground truth for retraining.

Modeling Approach with Hierarchical Ensemble Strategy:
Use a “Top-Down, Bottom-Up” approach. Forecast at the aggregate level (Category/Region) to capture macro-trends and reconcile these with granular SKU(Stock Keeping Unit)-level predictions to ensure total inventory alignment.
(1) Base Layer (Interpretability): Implement Prophet or Exponential Smoothing for high-level aggregates. This captures clear seasonalities (holidays, paydays) in a way that is easily explainable to business stakeholders.
(2) Granular Layer (The “Workhorse”): Use Global LightGBM or XGBoost models trained across entire product categories. This allows the model to learn shared patterns across similar items while efficiently handling categorical metadata like Store ID and Brand.
(3) High-Volatility Layer (Deep Learning): Deploy Temporal Fusion Transformers (TFT) or DeepAR specifically for high-volume or volatile items. These models capture complex, non-linear dependencies and multi-horizon temporal patterns that tree-based models might miss.
(4) Probabilistic Forecasting: Instead of a single point estimate, generate Quantile Forecasts (e.g., P10, P50, P90). This provides a range of uncertainty, allowing the logistics team to make data-driven decisions on safety stock levels.
Login to view more content
January 3, 2026
MSD0006 Video Recommendation System
How would you design a scalable and personalized video recommendation system for a platform like YouTube, Netflix, or TikTok that can recommend relevant videos in real time to billions of users?
Answer
A modern recommendation system uses a multi-stage pipeline to narrow down billions of videos to a top-20 list for a user in milliseconds.
It typically consists of:
(1) Candidate Generation (filtering down to hundreds),
(2) Ranking (scoring those hundreds using deep learning), and
(3) Re-ranking (applying business logic to ensure diversity, freshness, and safety or applying ad insertion).

The pipeline is: Data Logging -> Candidate Generation -> Ranking -> Re-ranking -> Serving.
Preparation for Data & Features
(1) User: watch history, watch time, likes, skips, follows
(2) Video: visual/audio/text embeddings, popularity, freshness
(3) Context: time of day, device, network
Candidate Generation (Retrieval):
This stage quickly reduces billions of videos to a manageable set (~100-500) from several sources; these sources are merged, deduplicated, and passed to the ranking stage:
(1) Collaborative Filtering (CF): Use matrix factorization or two-tower neural networks to create user and video embeddings. Retrieve videos similar to those the user has engaged with. This is the primary source.
(2) Content-Based: Use video title, description, audio, and frame embeddings to find videos similar to those the user likes.
(3) Seed-Based (Graph): For a “Watch Next” scenario, use the current video as a seed and find co-watched videos (e.g., “users who watched X also watched Y”).
(4) Trending/Global: Inject popular videos in the user’s region/language to promote freshness and viral content.
Ranking (Scoring):
The goal is to precisely order the ~500 candidates from retrieval. This model can be more complex and slower.
(1) Deep Neural Networks (DNNs): The industry standard. Takes in hundreds of concatenated features (user, video, cross-features) through multiple fully-connected layers to output a single score (e.g., predicted watch time). Captures complex, non-linear interactions.
(2) Multi-Task Learning (MTL): A key advancement. Instead of predicting just one objective (e.g., click), a single model with shared hidden layers has multiple output heads (e.g., for click, watch time, like, share). This improves generalization by sharing signals between tasks and helps balance engagement with satisfaction.
(3) Sequence/Transformer Models: To model the user’s immediate session context, models can treat the sequence of recently watched videos as input (using RNNs or Transformers). This helps predict the “next best video” in the context of the current viewing mood.

Re-ranking & Post-Processing:
Final polish of the list. Apply business and quality constraints such as diversity, freshness, safety filters, and exploration strategies before producing the final feed.
(1) Filters: Remove videos the user has already seen, filter out “shadow-banned” or inappropriate content.
(2)Diversity: Ensure the top 10 isn’t just one creator; inject different categories to avoid “filter bubbles.”
Login to view more content
January 2, 2026
MSD0005 Surveillance Video Anomaly Detection
How would you design an end-to-end surveillance system that automatically detects and alerts security personnel to ‘anomalous events’ (e.g., break-ins, fainting, or prohibited movements) in a large shopping mall?
Answer
A surveillance anomaly detection system captures video streams, preprocesses them into clips, and uses a deep learning model, typically a pretrained video backbone plus a lightweight anomaly scoring head, to identify unusual behavior.
It operates in a semi-supervised setup trained on normal data, runs in real time with sliding windows and temporal smoothing.
The system also includes alerting, monitoring, and a human-in-the-loop feedback loop for calibration and retraining.
Data Ingestion & Preprocessing: Capture real-time video streams from multiple cameras. Preprocess by resizing frames and normalizing pixel values.
Model architecture:
(1) Feature Extraction: A 2D CNN (like EfficientNet) extracts spatial features. To capture motion, we use Optical Flow or a 3D CNN (I3D) or a Video Transformer (Video Swin Transformer or TimeSformer) to look at blocks of frames together.
(2) The “Normal” Model: We train an Autoencoder or a Generative Adversarial Network (GAN) on months of “normal” mall activity.
(3) Detection Logic: When the model sees something new, its “reconstruction error” will be high. If the error exceeds a set threshold, it is flagged as an anomaly. Use the validation dataset for threshold calibration.
Alerting & Visualization: Generate real-time alerts. Send anomalous frames for human operators to review. Implement a Human-in-the-Loop system where guards can click “Not an Anomaly.”

System Considerations:
(1) Scalability: Use edge devices for preliminary processing to reduce bandwidth; cloud processing for heavy computation.
(2) Latency: Optimize frame rate and model inference time to enable near real-time detection.
(3) Evaluation: Test using precision, recall, F1-score, and monitor false positives/negatives.
Login to view more content
January 2, 2026
MSD0003 Spam Email Detection
Design an end-to-end Machine Learning system to effectively detect and filter spam emails in a high-volume email service.
Describe how you would design, train, and deploy this system.
Answer
The ML system for spam detection is a real-time classification pipeline. It begins with data collection and preprocessing (features extracted from text and metadata). A Supervised Learning model (e.g., Logistic Regression, Gradient Boosting, or a Neural Network) is trained on labeled data. The model is deployed as a real-time prediction service that intercepts incoming emails. Performance is monitored using metrics like precision and recall, and the model is continuously retrained to adapt to new spamming techniques (concept drift).
(1) Objectives and Metrics:
The goal is to classify incoming emails as spam or not spam in real time, minimizing false positives (mislabeling important emails as spam) while maintaining high recall (catching most spam).
(a) Primary Metric: Precision is critical. A high False Positive rate (marking legitimate emails as spam) is highly detrimental to user experience.
(b) Secondary Metric: Recall is also important to ensure most spam is caught (low False Negative rate).
(c) Evaluation Metric: The F1-score or Area Under the ROC Curve (AUC) provides a good balance.
(2) Data Collection:
(a) Sources: Historical emails labeled by users (spam / not spam). External spam datasets (e.g., Enron spam dataset).
(b) Features to collect:
Email text: subject and body.
Metadata: sender address, domain reputation, number of recipients.
Other: Embedded links, presence of attachments, message frequency.
(3) Data Preprocessing and Feature Engineering:
(a) Text cleaning: Remove HTML tags, URLs, punctuation.
(b) Tokenization: Split text into words/subwords (WordPiece).
(c) Textual Features Vectorization:
Classical: Term Frequency-Inverse Document Frequency (TF-IDF) or bag-of-words.
Modern: Pretrained embeddings (BERT, DistilBERT).
(d) Metadata Features Engineering:
Sender reputation score.
Ratio of uppercase words or spam keywords.
Number of links or suspicious domains.
(4) Model Selection and Training:
(a) Baseline Models: Start with Naive Bayes or Logistic Regression.
(b) Advanced Models: Ensemble methods like Random Forest or XGBoost, deep learning with CNNs/RNNs on text sequences, or pre-trained transformers like BERT for state-of-the-art performance.
(c) Training Process: Split the dataset for train/test; Use K-Fold Cross-Validation on a historical dataset, and maintain a separate held-out test set for final evaluation. For large-scale, distributed training with TensorFlow or PyTorch on GPUs.
(5) Deployment & Inference:
(a) Deployment Architecture: The trained model is saved (e.g., in a model registry) and loaded into a low-latency prediction service.
(b) Inference Flow:
The inference flow is shown in the figure below.

Step 1: The mail server receives incoming email.
Step 2: The email content and headers are passed to the Spam Prediction Service API.
Step 3: The service performs real-time feature extraction and feeds the feature vector to the loaded model.
Step 4: The model returns a spam probability score (e.g., 0.95).
Step 5: A threshold is applied (e.g., a score greater than 0.8 is classified as spam).
Final Action: If classified as spam, the email is moved to the user’s spam folder; otherwise, it goes to the inbox.
(6) Maintenance and Monitoring
One critical part of a spam system is its ability to adapt to Concept Drift—spammers constantly change their tactics.
(a) Performance Monitoring: Track and alert on key metrics.
User Feedback: Explicit ‘Mark as Spam’ or ‘Not Spam’ actions are the best source of new labeled data.
Model Accuracy: Monitor Precision, Recall, and F1-score daily.
Prediction Drift: Monitor the distribution of prediction scores. A sudden drop in the average predicted spam score might indicate the model is no longer effective.
(b) Retraining Pipeline: Implement a Continuous Training pipeline.
Login to view more content
November 2, 2025
MSD0001 Real-Time Factory Product Inspection
You are tasked with designing and deploying a deep learning-based computer vision system for real-time quality control on a high-speed manufacturing assembly line. The system must classify each product as ‘Pass’ or ‘Fail’ due to surface defects (scratches, cracks, misalignments).
Describe the complete end-to-end system design, from data acquisition and model selection to deployment and post-deployment maintenance.
Crucially, how would you address the challenges of real-time inference speed and the severe class imbalance due to the fact that defects are rare?”
Answer
The solution is an Edge-AI Computer Vision Pipeline. It starts with a controlled imaging setup to capture high-quality, consistent images. The core is a lightweight CNN (e.g., MobileNet) leveraging Transfer Learning, with a specialized loss function (e.g., Focal Loss) to handle class imbalance. Deployment occurs on a local Edge GPU to guarantee low-latency inference. A continuous MLOps loop monitors performance and facilitates model retraining against new or subtle defects (concept drift).
(1) Data & Setup: Controlled environment (lighting/staging), high-resolution cameras, and conduct Transfer Learning to reduce the need for large-scale data collection.
(2) Imbalance Handling: Use Focal Loss or weighted loss functions, combined with heavy data augmentation and oversampling of the ‘Fail’ class.
(3) Model Architecture: Choose a lightweight CNN (e.g., MobileNetV2, EfficientNet-B0) optimized for speed over a very large, deep network.
(4) Real-Time Deployment: Edge deployment on an industrial GPU (e.g., NVIDIA Jetson) using model optimization/quantization (e.g., ONNX, TensorRT) to ensure sub-100ms inference.
(5) Post-Deployment MLOps: Implement a feedback loop for logging all classifications (especially False Negatives) and trigger periodic retraining to combat model drift.
Login to view more content
October 30, 2025