Random Forest

 Random Forest:

Random Forest algorithm is a supervised classification and regression algorithm. As the name suggests, this algorithm randomly creates a forest with several trees.

Random Forest build multiple decision trees and glues them together to get a more accurate and stable prediction. The forest it builds is a collection of decision trees, trained with the bagging method.

Decision trees built on the entire data set, by making use of all the predicting variables. However, Random forest is an ensemble of decision trees, it randomly selects a set of parameters and creates a decision tree for each set of chosen parameters.

Why use Random forest??

Even though Decision trees are convenient and easily implemented, they lack accuracy. Decision tree work very effectively on training data, but they aren't flexible when it comes to classifying new sample. This happens due to over-fitting. Which leads to reducing the accuracy on testing data.

This is where Random forest comes in. It is based on the idea of Bagging, which is combining the result of multiple decision trees on different samples of the data set.

Working of Random forest:

  • step 1: First, start with the selection of random samples from a given data set.
  • step 2: Next, this algorithm will construct a decision tree of every sample then it will get the prediction result from every decision tree.
  • step 3: In this step, voting will be performed for every predicted result.
  • step 4: At last, select the most voted prediction result as the final prediction result.


Pros and cons of Random forest:
Pros: 
  1. It overcomes the problem of over fitting by averaging or combining the results of different decision trees.
  2. Random forest work well on large data sets, than a single decision tree does.
  3. Random forest has less variance than single decision tree.
  4. Random forests are very flexible and possess very high accuracy.
Cons:
  1. Complexity is the main disadvantage of Random forest.
  2. Construction of Random forests are much harder and time consuming.
  3. More computational resources are required to implement Random forest algorithm.
  4. It is less intuitive in case when we have large collection of Decision trees.

Comments