[英]R - Random Forest - Delete New factor levels not present in the training data
I'm debugging a code with Random Forest package, with barely no previous R experience. 我正在使用Random Forest软件包调试代码,以前几乎没有R经验。
I've reached a point where, excecuting predict.randomForest
, I get the error: 我已经到了一个地步,excecuting predict.randomForest
,我得到的错误:
New factor levels not present in the training data. 训练数据中不存在新的因子水平。
Searching this site I've found the reason and understood that I need to delete the records that are causing the problem. 在搜索此站点时,我已经找到了原因,并且了解到我需要删除引起问题的记录。
How can I isolate (find out) which columns/rows are causing the problems? 如何隔离(找出)引起问题的列/行?
Assume you have train.data, which you used to build your model, test.data, which you now want to get predictions for, and your factor variable factor.var1, then you could do: 假设您拥有用于构建模型的train.data,现在想要获取预测的test.data,以及因子变量factor.var1,那么您可以执行以下操作:
levels(test.data$factor.var1) %in% levels(train.data$factor.var1)
Which will produce a logical vector corresponding to the factor levels in test.data, with the "FALSE" entries being the factor levels that were not present in your train.data. 它将生成与test.data中的因子水平相对应的逻辑矢量,其中“ FALSE”条目是train.data中不存在的因子水平。
简单的解决方案是将测试数据与训练数据绑定并进行预测,然后将要预测的行子集化。这对我有用
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.