Interview for Machine Learning

Tag: Tree

ML0065 Random Forest III
How to choose the number of features in a random forest?
Answer
Select the number of features ( $m$ ) using rules of thumb (default heuristics), then tune via cross-validation or out-of-bag (OOB) error to find the best value for your specific dataset.
Default Heuristics:
Classification: $m = \sqrt{p}$
Regression: $m = \frac{p}{3}$
Where:
$p$ = total number of features,
$m$ = number of features considered at each split.
Bias-Variance Trade-off:
(1) Smaller max_features will increase randomness, leading to less correlated trees (reducing variance) but potentially higher bias.
(2) Larger max_features will decrease randomness, leading to more correlated trees (increasing variance) but potentially lower bias.
Grid Search/Randomized Search:
This is the most robust method. Define a range of possible max_features values and use cross-validation to evaluate the model’s performance for each value.
Out-of-Bag (OOB) Error:
Random Forests can estimate the generalization error internally using OOB samples. You can monitor the OOB error as you vary max_features to find the optimal value.
The figure below shows the cross-validation accuracy curve when using different numbers of features.
Login to view more content
July 19, 2025
ML0064 Random Forest II
Please explain the benefits and drawbacks of random forest.
Answer
Random Forest is a powerful ensemble method that reduces overfitting and improves predictive accuracy by combining many decision trees. However, it trades interpretability and computational efficiency for these benefits and may require careful tuning when dealing with large, imbalanced, or sparse datasets.
Benefits of random forest:
(1) Reduces Overfitting: Aggregating many trees lowers variance.
(2) Robust to Noise and Outliers: Less sensitive to anomalous data.
(3) Handles High Dimensionality: Works well with many input features.
(4) Estimates Feature Importance: Helps identify influential variables.
(5) Built-in Bagging: Bootstrap sampling improves generalization.
Drawbacks of random forest:
(1) Less Interpretability: Hard to visualize or explain compared to a single decision tree.
(2) Computational Cost: Training and prediction can be slower with many trees.
(3) Memory Usage: Large forests can consume significant resources.
(4) Biased with Imbalanced Data: Class imbalance can lead to biased predictions.
(5) Not Always Optimal for Sparse Data: May underperform compared to other algorithms on very sparse datasets.
The example below demonstrates that the random forest sometimes underperforms on the imbalanced dataset.
Login to view more content
July 8, 2025
ML0063 Random Forest
How does the random forest algorithm operate? Please outline its key steps.
Answer
Random Forest builds an ensemble of decision trees using bootstrapped samples and random feature subsets at each split. This combination reduces variance, combats overfitting, and improves predictive accuracy. The final output aggregates the predictions of all trees (majority vote for classification, averaging for regression).
(1) Bootstrap Sampling: Create multiple subsets of the original training data by sampling with replacement (bootstrap samples).
(2) Grow Decision Trees: For each bootstrap sample, train an unpruned decision tree.
(3) Random Feature Selection: At each split in a tree, randomly select a subset of features. The split is chosen only among this random subset (increases diversity).
(4) Aggregate Results with Voting or Averaging:
Classification: Each tree votes for a class label. The majority vote is used.
$\hat{y} = \mathrm{mode}\, { T_b(x) },\quad b=1,\ldots,B$
Where:
$T_b(x)$ = prediction of the b-th tree.
$B$ = total number of trees.
Regression: Each tree predicts a numeric value. The average is used.
$\hat{y} = \frac{1}{B}\sum_{b=1}^{B} T_b(x)$
Where:
$T_b(x)$ = prediction of the b-th tree.
$B$ = total number of trees.
The example below shows the decision boundary differences between three decision trees and their random forest ensemble.
Login to view more content
July 4, 2025
ML0062 Decision Tree
Please explain how a decision tree works.
Answer
A decision tree partitions the input space into regions by recursively splitting on features that best separate the target variable. Each split aims to improve the “purity” of the resulting subsets, as measured by criteria such as Gini impurity or Entropy. Predictions are made by following the sequence of splits down to a leaf node and returning the most common class (classification) or average target (regression).
Structure: A tree of nodes where each internal node tests a feature, branches represent feature outcomes, and leaves give predictions.
Splitting Criterion: Chooses the best feature (and threshold) by maximizing purity—e.g., Information Gain, Gini Impurity, or Variance Reduction.
Recursive Growth: Starting at the root, data is split, then the process recurses on each subset until stopping criteria (max depth, min samples, or pure leaves) are met.
Prediction: A new sample “travels” from root to leaf by following feature-test branches; the leaf’s label or value is returned.
The example below demonstrates using a Decision Tree on a 2-feature dataset for classification.
Login to view more content
June 30, 2025