Important Terminology

1. Root Node: The starting node of the tree.

2. Leaf Node: The final nodes of the tree where path is terminated.

3. Internal Node: The node that is neither a root node nor a Internal Node.

import pandas as pd

data = pd.DataFrame({'Outlook':('Sunny','Sunny','Overcast','Rain','Rain','Rain','Overcast','Sunny','Sunny','Rain','Sunny','Overcast','Overcast','Rain'),
                     'Temperature':('Hot','Hot','Hot','Mild','Cool','Cool','Cool','Mild','Cool','Mild','Mild','Mild','Hot','Cold'),
                     'Humidity':('High','High','High','High','Normal','Normal','Normal','High','Normal','Normal','Normal','High','Normal','High'),
                     'Wind':('weak','Strong','Weak','Weak','Weak','Strong','Strong','Weak','Weak','Weak','Strong','Strong','Weak','Strong'),
                     'Target':('No','No','Yes','Yes','Yes','No','Yes','No','Yes','Yes','Yes','Yes','Yes','No')})
data
Outlook Temperature Humidity Wind Target
0 Sunny Hot High weak No
1 Sunny Hot High Strong No
2 Overcast Hot High Weak Yes
3 Rain Mild High Weak Yes
4 Rain Cool Normal Weak Yes
5 Rain Cool Normal Strong No
6 Overcast Cool Normal Strong Yes
7 Sunny Mild High Weak No
8 Sunny Cool Normal Weak Yes
9 Rain Mild Normal Weak Yes
10 Sunny Mild Normal Strong Yes
11 Overcast Mild High Strong Yes
12 Overcast Hot Normal Weak Yes
13 Rain Cold High Strong No

Entropy

Formula:

\(E(X) = -\sum_{i=1}^{K} P(x_i){\log_b}(P(x_i))\) —- where K is number of classes

data.Target.value_counts()
Yes    9
No     5
Name: Target, dtype: int64

So when we substitue in Entropy formula