简体   繁体   中英

Can training set be used to determine variable importance using randomForest in R although the prediction of testing set is quite low?

I am using randomForest in R, I have a training model with R^2 of 0.94 , however , the prediction capacity for testing data is quite low. I would like to know if I can still use this training model only for determining which variable is more important/effective for output prediction.

Thanks

Based on what little information you provide, the question is hard to answer (think about providing more detail and background). Low prediction quality can result from wrong algorithm tuning, or it can be inherent in the data, ie your predictors themselves are not very strongly related to the outcome. In the first case, the prediction could be better with different parameters, eg more or less trees, different values for mtry, etc. If this is the case, then your importance measures are just as biased as your prediction (and should be used with caution). If the predictors themselves are weak, that means that your low quality prediction is as good as it gets. In this case, I would say the importance measures can be used, but they only tell you which of your overall weak predictors are more or less weak.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM