简体   繁体   中英

Machine Learning - Decision Tree - splitting feature value

I had a question on splitting the node. I have 4 features and want to predict if the person will play, maybe play or not play. Based on the Information Gain, I have Weather as the first feature to split on which gives me Rainy, Hot and Humid as the branches. Rainy results in a pure Yes prediction. Hot and Humid do not. I am trying to determine which feature value (Hot or Humid?) should I select to grow / split next. I know that I can select the next feature depending on the max information gain. The next feature that has the max Information Gain is Gender. But I don't know if I should use Hot to go further down or Humid?

               Weather  
Rainy            Hot             Humid
Yes                     


Gender  YoungOrOld  Weather Mood    Play?
Male    0           Hot     Bad     Yes
Male    1           Hot     OK      Yes
Female  1           Hot     OK      Maybe
Female  0           Hot     Bad     Yes
Male    1           Hot     OK      Yes
Male    0           Humid   OK      Yes
Female  1           Humid   OK      Maybe
Female  1           Rainy   Good    No
Male    2           Rainy   OK      No
Female  2           Rainy   Good    No

You have divided samples of your dataset by feature "Weather", now you see that when "Weather=Rainy" samples in a node are pure, so you don't have to split this node from here, unlike other non-pure nodes where "Weather=Hot" or "Weather=Humid". Because of impurity, by default you should split both of them. But you can specify your own stopping criterion, besides stopping when node is pure, you can specify minimum number of samples required to split a node, and then stop division of node not only when it is pure, but also when there are too little of samples in node to perform split.

You have already split on weather and gender. weather == Rainy needs no more splitting else gender = Male needs no more splitting

The split you propose would be Hot vs Humid, but this doesn't gain anything. Instead, split on YoungOrOld. The two Female '1' entries are Maybe; everyone else is Yes. Now all nodes are pure.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM