简体   繁体   English

R-随机森林-删除训练数据中不存在的新因子水平

[英]R - Random Forest - Delete New factor levels not present in the training data

I'm debugging a code with Random Forest package, with barely no previous R experience. 我正在使用Random Forest软件包调试代码,以前几乎没有R经验。

I've reached a point where, excecuting predict.randomForest , I get the error: 我已经到了一个地步,excecuting predict.randomForest ,我得到的错误:

New factor levels not present in the training data. 训练数据中不存在新的因子水平。

Searching this site I've found the reason and understood that I need to delete the records that are causing the problem. 在搜索此站点时,我已经找到了原因,并且了解到我需要删除引起问题的记录。

How can I isolate (find out) which columns/rows are causing the problems? 如何隔离(找出)引起问题的列/行?

Assume you have train.data, which you used to build your model, test.data, which you now want to get predictions for, and your factor variable factor.var1, then you could do: 假设您拥有用于构建模型的train.data,现在想要获取预测的test.data,以及因子变量factor.var1,那么您可以执行以下操作:

levels(test.data$factor.var1) %in% levels(train.data$factor.var1)

Which will produce a logical vector corresponding to the factor levels in test.data, with the "FALSE" entries being the factor levels that were not present in your train.data. 它将生成与test.data中的因子水平相对应的逻辑矢量,其中“ FALSE”条目是train.data中不存在的因子水平。

简单的解决方案是将测试数据与训练数据绑定并进行预测,然后将要预测的行子集化。这对我有用

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 R中的随机森林:训练数据中不存在新因子水平 - Random Forest in R: New factor levels not present in the training data 如果测试数据中存在新的因子水平,R 中的随机森林包在预测()期间会显示错误。 有什么办法可以避免这个错误吗? - Random forest package in R shows error during prediction() if there are new factor levels present in test data. Is there any way to avoid this error? 训练数据中不存在新的因子水平 - New factor levels not present in the training data 随机森林:处理R中的因子水平时出错 - random forest: error in dealing with factor levels in R R:使用 PCA 数据训练随机森林 - R: training random forest using PCA data 在R中进行随机森林预测时,将训练数据指定为新数据的效果 - The Effect of Specifying Training Data as New Data when Making Random Forest Predictions in R randomForest()如何预测不在训练数据中的新因子水平? - How does randomForest() predict for new factor levels not in training data? R - 新数据的随机森林预测 - R - Random Forest Prediction on new data R中的随机森林是否有训练数据大小的限制? - Does random forest in R have a limitation of size of training data? R:随机森林回归 model 中的错误训练数据 - R: Error training data in random forest regression model
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM