Gini Impurity
Suppose we have a dataset
The first example we investigate is a pure 0 dataset.
object |
---|
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
For such an all-0 dataset, we would like to define its impurity as 0. Same with an all-1 dataset. For a dataset with 50% of 1 and 50% of 0, we would define its impurity as max due to the symmetries between 0 and 1.
Definition
Given a dataset
where
In the above example, we have two classes,
The Gini impurity is
Examples
Suppose we have another dataset with 50% of the values being 50%.
object |
---|
0 |
0 |
1 |
0 |
0 |
1 |
1 |
1 |
0 |
0 |
0 |
1 |
The Gini impurity is
For data with two possible values
For data with three possible values, the Gini impurity is also visualized using the same chart given the condition that
cards/machine-learning/measurement/gini-impurity
:cards/machine-learning/measurement/gini-impurity
Links to:L Ma (2020). 'Gini Impurity', Datumorphism, 01 April. Available at: https://datumorphism.leima.is/cards/machine-learning/measurement/gini-impurity/.