Tag: Basics

  • ML0023 Gradient Descent

    What is Gradient Descent in machine learning?

    Answer

    Gradient descent is an iterative optimization algorithm used to minimize a function, most commonly a cost or loss function in machine learning, by moving step-by-step in the direction of the steepest descent (i.e., opposite to the gradient).
    In each iteration, the algorithm computes the gradient of the function with respect to its parameters, then updates the parameters by subtracting a fraction (the learning rate) of this gradient.
    This process is repeated until the function converges to a minimum (which, for convex functions, is the global minimum) or until the updates become negligibly small.

    The update rule for Gradient descent:
    {\huge\theta = \theta - \alpha \nabla J(\theta)}
    where:
    {\huge\theta} represents the parameters being optimized (for example, the weights in a model).
    {\huge\alpha} represents the learning_rate.
    {\huge\nabla J(\theta)} is the gradient of the cost function {\huge J( \theta )} with respect to {\huge\theta}.


    Login to view more content

  • ML0020 Data Split

    How to split the dataset?

    Answer

    A good data split in machine learning ensures that the model is trained, validated, and tested effectively to generalize well on unseen data.
    The typical approach involves dividing the dataset into three sets: Training Set, Validation Set, and Test Set.

    Training Set: Used to train the machine learning model. The model learns patterns and relationships in the data from this set.
    Validation Set: Used to tune hyperparameters of the model and evaluate its performance during training. This helps prevent overfitting to the training data and allows you to select the best model configuration.  
    Test Set: Used for a final, unbiased evaluation of the trained model’s performance on completely unseen data. This provides an estimate of how well the model will generalize to new, real-world data.

    Stratification for Imbalanced Data: For imbalanced datasets, consider using stratified splits to maintain the same proportion of classes across the training and test sets.


    Login to view more content
  • ML0017 Data Augmentation

    What are the common data augmentation techniques?

    Answer

    Data augmentation refers to techniques used to increase the diversity and size of a training dataset by creating modified versions of the existing data. It’s especially popular in applications like computer vision and natural language processing, where collecting large datasets can be expensive or time-consuming.

    Common Techniques:
    Computer Vision:

    Geometric Transformations: Rotate, flip, crop, or scale images
    Color Adjustments: Change brightness, contrast, saturation, or apply color jittering.
    Noise Injection: Add random noise or blur to images.

    Natural Language Processing:
    Synonym Replacement: Replace words with their synonyms.
    Back Translation: Translate text to another language and back.
    Random Insertion/Deletion: Add/remove words randomly.

    Tabular Data:
    SMOTE (Synthetic Minority Oversampling Technique): Generate synthetic data points for minority classes.
    Noise Injection: Add small random noise to numeric features.


    Login to view more content
  • ML0010 Epoch Selection

    What are effective strategies for selecting the appropriate number of training epochs in machine learning?

    Answer

    An epoch represents one complete pass through the entire training dataset. Emphasize its role in iterative learning and weight updates.

    Choosing the right number of epochs involves striking a balance between undertraining and overfitting.
    (1) Monitor Validation Metrics: Regularly evaluate performance on a validation set. If the validation loss begins to plateau or increase, it may indicate that further training won’t yield improvements.
    (2) Implement Early Stopping: Use early stopping techniques to automatically halt training when the model’s performance ceases to improve, thereby avoiding overfitting.
    (3) Experimentation: Begin with a moderate range (e.g., 10–100 epochs) and adjust based on observed training and validation curves.
    (4) Assess Model and Data Complexity: More intricate models or complex datasets may require additional epochs to capture underlying patterns, while simpler scenarios can converge more rapidly.

    In short, select epoch sizes by closely monitoring the model’s performance, employing early stopping to refine the process, and tailoring the approach to the complexities of the specific task.


    Login to view more content

  • ML0009 Batch Size Selection

    What are the best strategies for selecting the appropriate batch size?

    Answer

    Selecting an appropriate batch size is another crucial hyperparameter choice in neural network training, impacting both performance and training efficiency. Select a batch size that balances your hardware constraints and the optimization trade-offs. Start with a moderate value (e.g., 16, 32, or 64) and adjust it based on the available memory, the stability of your gradient updates, and the model’s validation performance.

    (1) Memory Constraints: Larger batch sizes require more GPU (or CPU) memory.
    (2) Dataset Size: Larger datasets can generally accommodate larger batch sizes. Smaller datasets may benefit from smaller batch sizes to introduce more variability in the training process.
    (3) Learning Rate Interaction: The appropriate learning rate often depends on the batch size; large batches might allow or even require a higher learning rate, while small batches might need a lower one. Some practitioners adjust the learning rate proportionally to the batch size.


    Login to view more content

  • ML0008 Learning Rate Selection

    What are the best practices for selecting an optimal learning rate?

    Answer

    Selecting an appropriate learning rate is a crucial part of training neural networks. It significantly impacts how quickly and effectively your model learns.

    1. Grid or Random Search: Experiment with a range of learning rates (for example, 0.0001, 0.001, 0.01, etc.) and observe training performance. This systematic exploration can help narrow down an effective learning rate, though it may be computationally intensive.
    2. Adaptive Optimizers: Use optimizers like Adam, RMSProp, or Adagrad that adjust the learning rate for each parameter automatically. These methods often require less manual tuning because they adapt based on the gradient history.
    3. Learning Rate Schedules: Implement strategies such as step decay, exponential decay, or cosine annealing. These schedules reduce the learning rate over time, which can help fine-tune the model as it approaches convergence.
    4. Monitoring Training Loss: Pay close attention to the training loss during training. If the loss is not decreasing, or if it’s oscillating, adjust the learning rate accordingly.  


    Login to view more content
  • ML0007 Dropout

    What is dropout in neural network training?

    Answer

    Dropout is a regularization technique used during neural network training to prevent overfitting.
    During each training step, a fraction of the neurons (and their corresponding connections) are randomly “dropped out” (i.e., set their activations to zero). This forces the network to learn more robust features because it can’t rely on any single neuron; instead, it learns distributed representations by effectively training an ensemble of smaller sub-networks. This will improve the model’s ability to generalize to unseen data.


    Login to view more content
  • ML0002 Machine Learning Type

    What is the difference between supervised learning and unsupervised learning?

    Answer

    Supervised learning relies on labeled datasets, and each training sample comes with a label or output. The algorithm learns a mapping function that can predict the output, including new, unseen inputs.

    Unsupervised learning works with unlabeled Data. The algorithm aims to find hidden patterns or structures within the data.


    Login to view more content
  • ML0001 Loss Curve Plot

    The following training loss curves were plotted with different experiment settings. Which of these training loss curves most likely indicates the correct experiment settings?

    Answer

    A
    Explanation:
    In an ideal training environment, the training loss is expected to diminish steadily over time. This indicates that the model is learning and improving its performance over time.


    Login to view more content