简体   繁体   English

随机森林:处理R中的因子水平时出错

[英]random forest: error in dealing with factor levels in R

I am using rf model in R to predict a binary outcome 0 or 1. I have categorical variables (coded as numbers) in my input data which are coded as factor while training. 我在R使用rf模型预测二进制结果0或1。我在输入数据中有分类变量(编码为数字),在训练时将其编码为因子。 I use factor() function in R to convert the variable as factor. 我在R使用factor()函数将变量转换为factor。 So for every categorical variable x ,my code is like this. 因此,对于每个分类变量x ,我的代码都是这样。

feature_x1=factor(feature_x1) # Convert the variable into factor in training data. 
#This variable takes 3 levels 0,1,2

This works perfectly fine while training the model. 在训练模型时,这可以很好地工作。 Let us assume my model object is rf_model . 让我们假设我的模型对象是rf_model While running the model on new data which is just a vector of numbers. 在仅作为数字向量的新数据上运行模型时。 I first convert the number into factors for feature_x1 我首先将数字转换为feature_x1的因子

newdata=data.frame(1,2)
colnames(newdata)=c("feature_x1","feature_x2")
newdata$feature_x1=factor(newdata$feature_x1)
score=pred(rf_model,newdata,type="prob")

I am receiving the following error 我收到以下错误

Error in predict.randomForest(rf_model, newdata,type = "prob") : New factor levels not present in the training data predict.randomForest(rf_model,newdata,type =“ prob”)中的错误:训练数据中不存在新因子水平

How to deal with this error, because in reality, after training the model we will always have to deal with data for which outcome is unknown which is a just a single record. 如何处理此错误,因为实际上,在训练模型后,我们将始终必须处理结果未知的数据,这些数据仅仅是一条记录。

Please let me know if more clarity or code is required 请让我知道是否需要更多说明或代码

尝试

newdata$feature_x1 <- factor(newdata$feature_x1, levels=levels(feature_x1))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM