简体   繁体   中英

Case of No examples left while constructing a Decision Tree

I was reading the topic of Decision Trees(page 720) from book Artificial Intelligence A Modern Approach 3rd edition. The book is describing some cases that may occur after we split the training set(examples) by choosing an attribute. One of the case mentioned is

If there are no examples left, it means that no example has been observed for this combination of attribute values, and we return a default value calculated from the plurality classification of all the examples that were used in constructing the node's parent.

I understand that by plurality classification they mean majority rule. But I am unable to understand the above cases ie when could it occur. Some example of decision tree where the above cases becomes true.

Think of the problem as constructing a 2D table of occurrence counts where the column represents some feature or class to be considered and the rows represent particular configurations of other variables.

for example,

X Y Z | class counts
------+-------------
1 1 1 | ...
1 1 2 | ...
1 1 3 | ...

The table represents the joint distribution of the training set.

A particular combination of X, Y and Z (say 1,3,1) may not have been seen during training. The more variables you have, the more likely you will encounter unseen combinations. If you have 10 variables each with two states then there are 1024 possible configurations of those variables. If there are three states for each then the number of configurations would be 3 ^ 10, etc.

Frankly, I would use 1/numberCols for any particular column with a missing row as you don't really have any information regarding it. You could use 1/Sum(rows) for each column but this may unnecessarily bias the result. Depends on the data.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM