**Table of Contents:**

- What is a Decision Tree?
- Decision Trees in Machine Learning
- Types of Decision Trees
- The Expressiveness of the Decision Trees
- Hypothesis Space
- Decision Tree making Approaches
- Information Gain
- Advantages and Disadvantages of the Decision Trees

## What is a Decision Tree?

Generally, a decision tree is a type of chart that helps determine a specific set of actions. Just like a usual tree, it also has various branches. Every decision tree branch represents a different outcome or possible reaction to a problem. Moreover, the furthermost branches of this tree represent the ending results. These trees have great significance in decision-making. They provide all the possible outcomes at any time to visualize all the circumstances. Decision trees also enable the possibility of analyzing the result or consequence of any decision.

A decision tree often represents a flowchart structure. Here, every internal node corresponds to a test based on a feature, and every leaf node expresses a class label or a decision to make after the computation of all the features. The branches, however, represent a conjunction of features leading to the class labels. Moreover, the classification rules are the paths going from root to leaf.

## Decision Trees in Machine Learning

Utilizing decision trees is one of the approaches from statistics, data mining, and machine learning. These are a type of supervised machine learning. Here, the data analysis technique splits the data into various possible entities concerning a specific parameter. There are two main entities in a decision tree. These entities are nodes and leaves. The supervised learning method uses the decision tree as a predictive model to analyze an item’s observation in the branches to conclude the item’s target value in the leaves. The decision nodes represent the splitting of data, and the leaves represent the outcomes.

### Terminologies of Decision Tree

The decision tree usually represents human-like thinking traits in order to make a wise decision. Therefore decision trees are easily understandable. Moreover, there is a tree-like structure behind the decision tree, so the logic is easily comprehensible. In order to make it more understandable, the following are the terminologies to consider.

**Root Node:**This is where the decision tree starts. The root node represents the complete dataset which divides into two or more comparable sets.**Leaf Node:**The leaf nodes represent the final output. The algorithm can not segregate the tree further after coming to the leaf node.**Splitting:**This term means dividing the root node or the decision node into different sub-nodes according to the underlying conditions.**Pruning:**It is the process of cutting down the extra branches from the tree. It helps in concluding much sooner by excluding irrelevant branches.**Parent and Child node:**The parent node is the tree’s root node, while the other nodes are child nodes of the parent.**Branch/ Sub-Tree:**The splitting of the main tree forms new subtrees and branches.

## Types of Decision Trees

Decision trees are the non-parametric type of supervised learning. These are two main types of decision trees in machine learning, depending upon the target variable:

- Classification Trees
- Regression Trees

### Classification Trees

Classification trees are a type of decision trees where the target variable’s value can be discrete. It is a categorical variable decision tree where the value can be either this or that. It determines the result after considering the given information. e.g. A decision tree with two or more branches depending on a discrete set of values is a classification or decision tree.

#### Example of Classification Tree

A simple example of a classification tree is a dataset determining whether to play golf or not, considering the weather conditions outside. In this example, Outlook is the decision node that further splits into Sunny, Overcast, and Rainy. Here, the leaf node is Play, which concludes to Yes or No after traversing the tree on given conditions.

### Regression Trees

Regression trees are a type of decision trees where the target variable’s value can be continuous. It is a continuous variable decision tree where the variable’s values can be like real numbers. Here, the tree’s outcome can be like numbers, such as 246. Generally, a regression tree building involves inputs, which are a mixture of continuous and discrete variables. Each decision node analyzes inputs for testing a variable’s value. The regression tree works on binary recursive partitioning. It splits the data into partitions in every iteration and then splits those partitions into different smaller groups while moving up the branch.

#### Example of Regression Tree

An example of a regression tree can be predicting the selling prices of houses. The outcome of this example results in a continuous dependent value. In this example, the various determining factors can be the house’s space in square feet, which is a constant factor. There can also be categorical variables such as the style of the house, area or location, and many more.

## The Expressiveness of the Decision Trees

The decision trees can express any function of the input attributes, such as the users can construct a path to a leaf from the truth tables rows for the boolean functions. Generally, the users design the decision tree so that there is only one path from the root to each leaf for any training set unless there is any non-deterministic factor involved.

The XOR function is the function that returns true for the opposite inputs and false for the same input values. Below is the truth table for the XOR function:

A |
B |
A XOR B |

T | T | F |

T | F | T |

F | T | T |

F | F | F |

Below is the decision tree for the above XOR function, constructed from the truth table’s rows.

## Hypothesis Space

The users can construct various trees based on the number of boolean attributes involved in their construction. The number of different decision trees with the n number of attributes is equal to the number of boolean functions. It means that the number of unique truth tables with 2^n = 2^2^n. For instance, the 2 boolean attributes can make 2^2^2 distinct trees.

## Decision Tree making Approaches

There can be various features and attributes on which the users can split a tree or decide the nodes’ output. An instance could be, the users can choose to wait for the table to get empty in a restaurant based on how hungry they are, the waiting estimate given by the manager, or the waiting line’s length. The users’ primary goal while choosing the attributes is that they want to make a compact tree that is consistent with their examples, and the most significant attribute is at the root of the tree. Therefore, there are various questions the users ask themselves while making decisions at every node, in which the most crucial factor is information gain. Another popular technique of Attribute selection measure or ASM is Gini Index. By using ASM, the selection of the best attribute for the nodes of the tree becomes easier. Therefore, Information Gain and Gini Index are two popular techniques for ASM.

## Information Gain

The information gain is one of the decision points for users. They use it to choose the attributes on which they should split the branches at every tree level. As the decision tree should be compact and small, the chosen attribute should give more information about the class than other attributes. The users prioritize the attribute with the highest information gain and use it for the first split. In this way, this process continues until the information gain becomes 0.

There could be many metrics to define information gain and the algorithms for making these decisions. However, these algorithms mostly follow a top-down approach by choosing the best split at every tree level to give the best results.

### Gini Index

Gini index is a criterion of measuring purity and impurity while forming a decision tree in the Classification and Regression Tree (CART) algorithm. Gini Index is also known as the Gini impurity. Gini Index calculates the probability of a specific feature having that classifies as incorrectly in random selection. The values of the Gini Index vary between 0 and 1. The wise decision is to choose an attribute with a low Gini Index compared to a high Gini Index. The CART algorithm uses the Gini index for creating binary splits. The formula for determining the Gini I dex is to deduct the squared probabilities of each class from 1.

## Advantages and Disadvantages of the Decision Trees

It should be noted that decision trees are simple and the outcomes are purely based on decisions. In simple words, we can conclude that they are easy to use. Adding to this, they don’t even require computer scientists to understand them. Lastly, decision trees can handle numerical and categorical data with less prepossessing to prevent outliers in the data.

Although the decision trees are essential in machine learning, they are prone to overfitting, therefore, require early stopping or pruning of the branches. If some classes dominate in the data, the final Tree can be biased towards it. Lastly, it is complicated to choose the perfect attributes for tree splitting, and the users must be vigilant with parameter tuning.