简体   繁体   中英

Why do we select Entropy Gain as criteria in Decision Tree learning instead of decrease in error rates as a criteria?

I've been following the ML course by Tom Mitchel and in Decision Tree (DT) Learning, the Entropy Gain is chosen as ruling criterion for the choice of a feature/parameter x_i as child of another feature in DT top-down growth.

Always our goal of selecting a DT is to avoid overfitting by minimizing the error rates; then why don't we use error rate as a ruling criteria for feature/parameter selection in top-down growth of the tree.

Feature vector for Input data: X = < x_1, x_2......x_n >

You can't use error rate as you don't know what it will be in the end. That is, imagine that the tree is eventually of depth 10, and you are on level 2 of the tree, deciding which feature and which threshold to choose. You can't know at this stage what would be the error rate at level 10. So your criteria should only be based on the current level. With that being said, you don't have to use information gain. There are other criterias as well. For example, one such cretiria is Gini impurity, and this is the default cretiria that is used in scikit learn DecisionTreeClassifier .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM