**Table of Contents:**

- What is a Decision Tree?
- Decision Trees in Machine Learning
- Types of Decision Trees
- The Expressiveness of the Decision Trees
- Hypothesis Space
- Decision Tree making Approaches
- Information Gain
- Advantages and Disadvantages of the Decision Trees

## What is a Decision Tree?

Generally, a decision tree is a type of chart that helps determine a specific set of actions. Just like a usual tree, it also has various branches. Every decision tree branch represents a different outcome or possible reaction to a problem. Moreover, the furthermost branches of this tree represent the ending results. These trees have great significance in decision-making. They provide all the possible outcomes at any time to visualize all the circumstances. Decision trees also enable the possibility of analyzing the result or consequence of any decision.

## Decision Trees in Machine Learning

Utilizing decision trees is one of the approaches from statistics, data mining, and machine learning. These are a type of supervised machine learning. Here, the data analysis technique splits the data into various possible entities concerning a specific parameter. There are two main entities in a decision tree. These entities are nodes and leaves. The supervised learning method uses the decision tree as a predictive model to analyze an item’s observation in the branches to conclude the item’s target value in the leaves. The decision nodes represent the splitting of data, and the leaves represent the outcomes.

## Types of Decision Trees

These are two main types of decision trees in machine learning, depending upon the target variable:

- Classification Trees
- Regression Trees

### Classification Trees

Classification trees are a type of decision trees where the target variable’s value can be discrete. It is a categorical variable decision tree where the value can be either this or that. It determines the result after considering the given information. e.g. A decision tree with two or more branches depending on a discrete set of values is a classification or decision tree.

#### Example of Classification Tree

A simple example of a classification tree is a dataset determining whether to play golf or not, considering the weather conditions outside. In this example, Outlook is the decision node that further splits into Sunny, Overcast, and Rainy. Here, the leaf node is Play, which concludes to Yes or No after traversing the tree on given conditions.

### Regression Trees

Regression trees are a type of decision trees where the target variable’s value can be continuous. It is a continuous variable decision tree where the variable’s values can be like real numbers. Here, the tree’s outcome can be like numbers, such as 246. Generally, a regression tree building involves inputs, which are a mixture of continuous and discrete variables. Each decision node analyzes inputs for testing a variable’s value. The regression tree works on binary recursive partitioning. It splits the data into partitions in every iteration and then splits those partitions into different smaller groups while moving up the branch.

#### Example of Regression Tree

An example of a regression tree can be predicting the selling prices of houses. The outcome of this example results in a continuous dependent value. In this example, the various determining factors can be the house’s space in square feet, which is a constant factor. There can also be categorical variables such as the style of the house, area or location, and many more.

## The Expressiveness of the Decision Trees

The decision trees can express any function of the input attributes, such as the users can construct a path to a leaf from the truth tables rows for the boolean functions. Generally, the users design the decision tree so that there is only one path from the root to each leaf for any training set unless there is any non-deterministic factor involved.

The XOR function is the function that returns true for the opposite inputs and false for the same input values. Below is the truth table for the XOR function:

A |
B |
A XOR B |

T | T | F |

T | F | T |

F | T | T |

F | F | F |

Below is the decision tree for the above XOR function, constructed from the truth table’s rows.

## Hypothesis Space

The users can construct various trees based on the number of boolean attributes involved in their construction. The number of different decision trees with the n number of attributes is equal to the number of boolean functions. It means that the number of unique truth tables with 2^n = 2^2^n. For instance, the 2 boolean attributes can make 2^2^2 distinct trees.

## Decision Tree making Approaches

There can be various features and attributes on which the users can split a tree or decide the nodes’ output. An instance could be, the users can choose to wait for the table to get empty in a restaurant based on how hungry they are, the waiting estimate given by the manager, or the waiting line’s length. The users’ primary goal while choosing the attributes is that they want to make a compact tree that is consistent with their examples, and the most significant attribute is at the root of the tree. Therefore, there are various questions the users ask themselves while making decisions at every node, in which the most crucial factor is information gain.

## Information Gain

The information gain is one of the decision points for users. They use it to choose the attributes on which they should split the branches at every tree level. As the decision tree should be compact and small, the chosen attribute should give more information about the class than other attributes. The users prioritize the attribute with the highest information gain and use it for the first split. In this way, this process continues until the information gain becomes 0.

There could be many metrics to define information gain and the algorithms for making these decisions. However, these algorithms mostly follow a top-down approach by choosing the best split at every tree level to give the best results.

## Advantages and Disadvantages of the Decision Trees

It should be noted that decision trees are simple and the outcomes are purely based on decisions. In simple words, we can conclude that they are easy to use. Adding to this, they don’t even require computer scientists to understand them. Lastly, decision trees can handle numerical and categorical data with less prepossessing to prevent outliers in the data.

Although the decision trees are essential in machine learning, they are prone to overfitting, therefore, require early stopping or pruning of the branches. If some classes dominate in the data, the final Tree can be biased towards it. Lastly, it is complicated to choose the perfect attributes for tree splitting, and the users must be vigilant with parameter tuning.