ML0063 Random Forest

Written by

How does the random forest algorithm operate? Please outline its key steps.

Answer

Random Forest builds an ensemble of decision trees using bootstrapped samples and random feature subsets at each split. This combination reduces variance, combats overfitting, and improves predictive accuracy. The final output aggregates the predictions of all trees (majority vote for classification, averaging for regression).
(1) Bootstrap Sampling: Create multiple subsets of the original training data by sampling with replacement (bootstrap samples).
(2) Grow Decision Trees: For each bootstrap sample, train an unpruned decision tree.
(3) Random Feature Selection: At each split in a tree, randomly select a subset of features. The split is chosen only among this random subset (increases diversity).
(4) Aggregate Results with Voting or Averaging:
Classification: Each tree votes for a class label. The majority vote is used.
$\hat{y} = \mathrm{mode}\, { T_b(x) },\quad b=1,\ldots,B$
Where:
$T_b(x)$ = prediction of the b-th tree.
$B$ = total number of trees.

Regression: Each tree predicts a numeric value. The average is used.
$\hat{y} = \frac{1}{B}\sum_{b=1}^{B} T_b(x)$
Where:
$T_b(x)$ = prediction of the b-th tree.
$B$ = total number of trees.

The example below shows the decision boundary differences between three decision trees and their random forest ensemble.

Did you solve the problem?

Tree

ML0063 Random Forest

Comments

Leave a Reply Cancel reply

More posts

MSD0007 Demand Forecasting System for Retailer

MSD0006 Video Recommendation System

MSD0005 Surveillance Video Anomaly Detection

DL0052 Rotary Positional Embedding