Decision Trees
Important Terminology
1. Root Node: The starting node of the tree.
2. Leaf Node: The final nodes of the tree where path is terminated.
3. Internal Node: The node that is neither a root node nor a Internal Node.
import pandas as pd
data = pd.DataFrame({'Outlook':('Sunny','Sunny','Overcast','Rain','Rain','Rain','Overcast','Sunny','Sunny','Rain','Sunny','Overcast','Overcast','Rain'),
'Temperature':('Hot','Hot','Hot','Mild','Cool','Cool','Cool','Mild','Cool','Mild','Mild','Mild','Hot','Cold'),
'Humidity':('High','High','High','High','Normal','Normal','Normal','High','Normal','Normal','Normal','High','Normal','High'),
'Wind':('weak','Strong','Weak','Weak','Weak','Strong','Strong','Weak','Weak','Weak','Strong','Strong','Weak','Strong'),
'Target':('No','No','Yes','Yes','Yes','No','Yes','No','Yes','Yes','Yes','Yes','Yes','No')})
data
Outlook | Temperature | Humidity | Wind | Target | |
---|---|---|---|---|---|
0 | Sunny | Hot | High | weak | No |
1 | Sunny | Hot | High | Strong | No |
2 | Overcast | Hot | High | Weak | Yes |
3 | Rain | Mild | High | Weak | Yes |
4 | Rain | Cool | Normal | Weak | Yes |
5 | Rain | Cool | Normal | Strong | No |
6 | Overcast | Cool | Normal | Strong | Yes |
7 | Sunny | Mild | High | Weak | No |
8 | Sunny | Cool | Normal | Weak | Yes |
9 | Rain | Mild | Normal | Weak | Yes |
10 | Sunny | Mild | Normal | Strong | Yes |
11 | Overcast | Mild | High | Strong | Yes |
12 | Overcast | Hot | Normal | Weak | Yes |
13 | Rain | Cold | High | Strong | No |
Entropy
Formula:
\(E(X) = -\sum_{i=1}^{K} P(x_i){\log_b}(P(x_i))\) —- where K is number of classes
data.Target.value_counts()
Yes 9
No 5
Name: Target, dtype: int64
So when we substitue in Entropy formula
If there is any typo or mistakes in this article/blog, please let me know by commenting below. Thank You For Reading
Load Comments