[英]Error in ConfusionMatrix the data and reference factors must have the same number of levels
I've trained a tree model with R caret.我用 R caret 训练了一个树模型。 I'm now trying to generate a confusion matrix and keep getting the following error:
我现在正在尝试生成一个混淆矩阵并不断收到以下错误:
Error in confusionMatrix.default(predictionsTree, testdata$catgeory) : the data and reference factors must have the same number of levels
混淆Matrix.default(predictionsTree, testdata$catgeory) 中的错误:数据和参考因子必须具有相同的级别数
prob <- 0.5 #Specify class split
singleSplit <- createDataPartition(modellingData2$category, p=prob,
times=1, list=FALSE)
cvControl <- trainControl(method="repeatedcv", number=10, repeats=5)
traindata <- modellingData2[singleSplit,]
testdata <- modellingData2[-singleSplit,]
treeFit <- train(traindata$category~., data=traindata,
trControl=cvControl, method="rpart", tuneLength=10)
predictionsTree <- predict(treeFit, testdata)
confusionMatrix(predictionsTree, testdata$catgeory)
The error occurs when generating the confusion matrix.生成混淆矩阵时发生错误。 The levels are the same on both objects.
两个对象上的级别相同。 I cant figure out what the problem is.
我无法弄清楚问题是什么。 Their structure and levels are given below.
它们的结构和层次如下。 They should be the same.
他们应该是一样的。 Any help would be greatly appreciated as its making me cracked!!
任何帮助将不胜感激,因为它让我崩溃了!!
> str(predictionsTree)
Factor w/ 30 levels "16-Merchant Service Charge",..: 28 22 22 22 22 6 6 6 6 6 ...
> str(testdata$category)
Factor w/ 30 levels "16-Merchant Service Charge",..: 30 30 7 7 7 7 7 30 7 7 ...
> levels(predictionsTree)
[1] "16-Merchant Service Charge" "17-Unpaid Cheque Fee" "18-Gov. Stamp Duty" "Misc" "26-Standard Transfer Charge"
[6] "29-Bank Giro Credit" "3-Cheques Debit" "32-Standing Order - Debit" "33-Inter Branch Payment" "34-International"
[11] "35-Point of Sale" "39-Direct Debits Received" "4-Notified Bank Fees" "40-Cash Lodged" "42-International Receipts"
[16] "46-Direct Debits Paid" "56-Credit Card Receipts" "57-Inter Branch" "58-Unpaid Items" "59-Inter Company Transfers"
[21] "6-Notified Interest Credited" "61-Domestic" "64-Charge Refund" "66-Inter Company Transfers" "67-Suppliers"
[26] "68-Payroll" "69-Domestic" "73-Credit Card Payments" "82-CHAPS Fee" "Uncategorised"
> levels(testdata$category)
[1] "16-Merchant Service Charge" "17-Unpaid Cheque Fee" "18-Gov. Stamp Duty" "Misc" "26-Standard Transfer Charge"
[6] "29-Bank Giro Credit" "3-Cheques Debit" "32-Standing Order - Debit" "33-Inter Branch Payment" "34-International"
[11] "35-Point of Sale" "39-Direct Debits Received" "4-Notified Bank Fees" "40-Cash Lodged" "42-International Receipts"
[16] "46-Direct Debits Paid" "56-Credit Card Receipts" "57-Inter Branch" "58-Unpaid Items" "59-Inter Company Transfers"
[21] "6-Notified Interest Credited" "61-Domestic" "64-Charge Refund" "66-Inter Company Transfers" "67-Suppliers"
[26] "68-Payroll" "69-Domestic" "73-Credit Card Payments" "82-CHAPS Fee" "Uncategorised"
Try use:尝试使用:
confusionMatrix(table(Argument 1, Argument 2))
Thats worked for me.那对我有用。
Maybe your model is not predicting a certain factor.也许您的模型没有预测某个因素。 Use the
table()
function instead of confusionMatrix()
to see if that is the problem.使用
table()
函数而不是confusionMatrix()
来查看这是否是问题所在。
尝试为na.action
选项指定na.pass
:
predictionsTree <- predict(treeFit, testdata,na.action = na.pass)
Change them into a data frame and then use them in confusionMatrix function:将它们更改为数据框,然后在混淆矩阵函数中使用它们:
pridicted <- factor(predict(treeFit, testdata))
real <- factor(testdata$catgeory)
my_data1 <- data.frame(data = pridicted, type = "prediction")
my_data2 <- data.frame(data = real, type = "real")
my_data3 <- rbind(my_data1,my_data2)
# Check if the levels are identical
identical(levels(my_data3[my_data3$type == "prediction",1]) , levels(my_data3[my_data3$type == "real",1]))
confusionMatrix(my_data3[my_data3$type == "prediction",1], my_data3[my_data3$type == "real",1], dnn = c("Prediction", "Reference"))
Might be there are missing values in the testdata, Add the following line before "predictionsTree <- predict(treeFit, testdata)" to remove NAs.可能是 testdata 中缺少值,在“predictionsTree <- predict(treeFit, testdata)”之前添加以下行以删除 NA。 I had the same error and now it works for me.
我有同样的错误,现在它对我有用。
testdata <- testdata[complete.cases(testdata),]
您遇到的长度问题可能是由于训练集中存在 NAs - 要么删除不完整的案例,要么进行估算以使您没有缺失值。
I had same issue but went ahead and changed it after reading data file like so..我有同样的问题,但在读取数据文件后继续更改它。
data = na.omit(data)
Thanks all for pointer!谢谢大家指点!
make sure you installed the package with all its dependencies:确保您安装了包含所有依赖项的包:
install.packages('caret', dependencies = TRUE)
confusionMatrix( table(prediction, true_value) )
If your data contains NAs then sometimes it will be considered as a factor level,So omit these NAs initially如果您的数据包含 NAs 那么有时它会被视为一个因子水平,所以最初省略这些 NAs
DF = na.omit(DF)
Then,if your model fit is predicting some incorrect level,then it is better to use tables然后,如果您的模型拟合预测的水平不正确,那么最好使用表格
confusionMatrix(table(Arg1, Arg2))
I just ran into the same problem, I solved it by using R ordered factor data type.我刚刚遇到了同样的问题,我通过使用 R 有序因子数据类型解决了它。
levels <- levels(predictionsTree)
levels <- levels[order(levels)]
table(ordered(predictionsTree,levels), ordered(testdata$catgeory, levels))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.