Gini Impurity - Python Automation and Machine Learning for ICs - - An Online Book - |
||||||||
Python Automation and Machine Learning for ICs http://www.globalsino.com/ICs/ | ||||||||
Chapter/Index: Introduction | A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z | Appendix | ||||||||
================================================================================= "gini" in sklearn.tree refers to the criterion used for splitting nodes in the decision tree and the Gini index or Gini coefficient, which is a measure of statistical dispersion representing the inequality of a distribution. The Gini impurity is a measure of how often a randomly chosen element would be incorrectly classified. In decision trees, the Gini impurity is used to evaluate the purity of a node, and it is minimized during the process of building the tree. A Gini index of 0 implies perfect equality (all elements are the same), while a Gini index of 1 implies perfect inequality (all elements are different). The DecisionTreeClassifier in scikit-learn uses the Gini impurity by default as the criterion for making splits. The Gini impurity for a node is calculated based on the distribution of classes (target values) in that node. A lower Gini impurity indicates a more "pure" node with predominantly one class. In the decision trees, when building a tree, the algorithm seeks to minimize the Gini index at each node to make the resulting tree more effective in classifying data: Gini Index = 0: Perfectly equal distribution (all elements belong to the same class). The Gini impurity formula is given by, -------------------------------------- [3759a] where: D is the dataset at a particular node. c is the number of classes. pi is the proportion of instances of class i in the node. In the decision trees, the algorithm seeks to split nodes in a way that minimizes the weighted sum of Gini impurities in the child nodes.
============================================
|
||||||||
================================================================================= | ||||||||
|
||||||||