简体   繁体   中英

What does Random Forest do with unseen data?

When I built my random forest model using scikit learn in python, I set a condition (where clause in sql query) so that the training data only contain values whose value is greater than 0.

I am curious to know how random forest handles test data whose value is less than 0, which the random forest model has never seen before in the training data.

They will be treated in the same manner as the minimal value already encountered in the training set. RF is just a bunch of voting decision trees, and (basic) DTs can only form decisions in form of "if feature X is > then T go left, otherwise go right". Consequently, if you fit it to data which, for a given feature, has only values in [0, inf], it will either not use this feature at all or use it in a form given above (as decision of form "if X is > than T", where T has to be from (0, inf) to make any sense for the training data). Consequently if you simply take your new data and change negative values to "0", the result will be identical.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM