[英]random forest: error in dealing with factor levels in R
I am using rf
model in R
to predict a binary outcome 0 or 1. I have categorical variables (coded as numbers) in my input data which are coded as factor while training. 我在R
使用rf
模型预测二进制结果0或1。我在输入数据中有分类变量(编码为数字),在训练时将其编码为因子。 I use factor()
function in R
to convert the variable as factor. 我在R
使用factor()
函数将变量转换为factor。 So for every categorical variable x
,my code is like this. 因此,对于每个分类变量x
,我的代码都是这样。
feature_x1=factor(feature_x1) # Convert the variable into factor in training data.
#This variable takes 3 levels 0,1,2
This works perfectly fine while training the model. 在训练模型时,这可以很好地工作。 Let us assume my model object is rf_model
. 让我们假设我的模型对象是rf_model
。 While running the model on new data which is just a vector of numbers. 在仅作为数字向量的新数据上运行模型时。 I first convert the number into factors for feature_x1 我首先将数字转换为feature_x1的因子
newdata=data.frame(1,2)
colnames(newdata)=c("feature_x1","feature_x2")
newdata$feature_x1=factor(newdata$feature_x1)
score=pred(rf_model,newdata,type="prob")
I am receiving the following error 我收到以下错误
Error in predict.randomForest(rf_model, newdata,type = "prob") : New factor levels not present in the training data predict.randomForest(rf_model,newdata,type =“ prob”)中的错误:训练数据中不存在新因子水平
How to deal with this error, because in reality, after training the model we will always have to deal with data for which outcome is unknown which is a just a single record. 如何处理此错误,因为实际上,在训练模型后,我们将始终必须处理结果未知的数据,这些数据仅仅是一条记录。
Please let me know if more clarity or code is required 请让我知道是否需要更多说明或代码
尝试
newdata$feature_x1 <- factor(newdata$feature_x1, levels=levels(feature_x1))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.