R-随机森林-删除训练数据中不存在的新因子水平

Question

I'm debugging a code with Random Forest package, with barely no previous R experience. 我正在使用Random Forest软件包调试代码，以前几乎没有R经验。

I've reached a point where, excecuting predict.randomForest , I get the error: 我已经到了一个地步，excecuting predict.randomForest ，我得到的错误：

New factor levels not present in the training data. 训练数据中不存在新的因子水平。

Searching this site I've found the reason and understood that I need to delete the records that are causing the problem. 在搜索此站点时，我已经找到了原因，并且了解到我需要删除引起问题的记录。

How can I isolate (find out) which columns/rows are causing the problems? 如何隔离（找出）引起问题的列/行？

Answer 1

Assume you have train.data, which you used to build your model, test.data, which you now want to get predictions for, and your factor variable factor.var1, then you could do: 假设您拥有用于构建模型的train.data，现在想要获取预测的test.data，以及因子变量factor.var1，那么您可以执行以下操作：

levels(test.data$factor.var1) %in% levels(train.data$factor.var1)

Which will produce a logical vector corresponding to the factor levels in test.data, with the "FALSE" entries being the factor levels that were not present in your train.data. 它将生成与test.data中的因子水平相对应的逻辑矢量，其中“ FALSE”条目是train.data中不存在的因子水平。

Answer 2

简单的解决方案是将测试数据与训练数据绑定并进行预测，然后将要预测的行子集化。这对我有用

R-随机森林-删除训练数据中不存在的新因子水平

问题描述

2 个解决方案

解决方案1
4 已采纳 2015-08-13 14:30:20

解决方案2
0 2015-11-04 13:13:50

R-随机森林-删除训练数据中不存在的新因子水平

问题描述

2 个解决方案

解决方案1 4 已采纳 2015-08-13 14:30:20

解决方案2 0 2015-11-04 13:13:50

解决方案1
4 已采纳 2015-08-13 14:30:20

解决方案2
0 2015-11-04 13:13:50