Decision Trees
Important Terminology
1. Root Node: The starting node of the tree.
2. Leaf Node: The final nodes of the tree where path is terminated.
3. Internal Node: The node that is neither a root node nor a Internal Node.
import pandas as pd
data = pd.DataFrame({'Outlook':('Sunny','Sunny','Overcast','Rain','Rain','Rain','Overcast','Sunny','Sunny','Rain','Sunny','Overcast','Overcast','Rain'),
'Temperature':('Hot','Hot','Hot','Mild','Cool','Cool','Cool','Mild','Cool','Mild','Mild','Mild','Hot','Cold'),
'Humidity':('High','High','High','High','Normal','Normal','Normal','High','Normal','Normal','Normal','High','Normal','High'),
'Wind':('weak','Strong','Weak','Weak','Weak','Strong','Strong','Weak','Weak','Weak','Strong','Strong','Weak','Strong'),
'Target':('No','No','Yes','Yes','Yes','No','Yes','No','Yes','Yes','Yes','Yes','Yes','No')})
data
| Outlook | Temperature | Humidity | Wind | Target | |
|---|---|---|---|---|---|
| 0 | Sunny | Hot | High | weak | No |
| 1 | Sunny | Hot | High | Strong | No |
| 2 | Overcast | Hot | High | Weak | Yes |
| 3 | Rain | Mild | High | Weak | Yes |
| 4 | Rain | Cool | Normal | Weak | Yes |
| 5 | Rain | Cool | Normal | Strong | No |
| 6 | Overcast | Cool | Normal | Strong | Yes |
| 7 | Sunny | Mild | High | Weak | No |
| 8 | Sunny | Cool | Normal | Weak | Yes |
| 9 | Rain | Mild | Normal | Weak | Yes |
| 10 | Sunny | Mild | Normal | Strong | Yes |
| 11 | Overcast | Mild | High | Strong | Yes |
| 12 | Overcast | Hot | Normal | Weak | Yes |
| 13 | Rain | Cold | High | Strong | No |
Entropy
Formula:
\(E(X) = -\sum_{i=1}^{K} P(x_i){\log_b}(P(x_i))\) —- where K is number of classes
data.Target.value_counts()
Yes 9
No 5
Name: Target, dtype: int64
So when we substitue in Entropy formula
If there is any typo or mistakes in this article/blog, please let me know by commenting below. Thank You For Reading
Load Comments