Random forests: a collection of Decision trees!

In literal sense, a forest is an area full of trees. Likewise, in technical sense, a Random Forest is essentially a collection of Decision Trees. Although both are classification algorithms which are supervised in nature, which one is better to use?

A Decision Tree is built on an entire data set, using all the features/variables while a Random forest randomly (as the name suggests) selects observations/rows and specific features/variables to build several decision trees and then average the results. Each tree “votes” or chooses the class and the one receiving the most votes by majority is the “winner” or the predicted class.

A Decision tree is comparatively easier to interpret and visualize, works well on large datasets and can handle categorical as well as numerical data. However, choosing a comfortable algorithm for optimal choice at each node and decision trees are also vulnerable to over fitting.

Random Forests come to our rescue in such situations. Since they select samples and the results are aggregated and averaged, they are more robust than decision trees. Random Forests are a strong modelling technique than Decision Trees.

Random forests: a collection of Decision trees!

Related Posts