ConfusionMatrix 中的误差数据和参考因子必须具有相同的级别数

Question

I've trained a tree model with R caret.我用 R caret 训练了一个树模型。 I'm now trying to generate a confusion matrix and keep getting the following error:我现在正在尝试生成一个混淆矩阵并不断收到以下错误：

Error in confusionMatrix.default(predictionsTree, testdata$catgeory) : the data and reference factors must have the same number of levels混淆Matrix.default(predictionsTree, testdata$catgeory) 中的错误：数据和参考因子必须具有相同的级别数

prob <- 0.5 #Specify class split
singleSplit <- createDataPartition(modellingData2$category, p=prob,
                                   times=1, list=FALSE)
cvControl <- trainControl(method="repeatedcv", number=10, repeats=5)
traindata <- modellingData2[singleSplit,]
testdata <- modellingData2[-singleSplit,]
treeFit <- train(traindata$category~., data=traindata,
                 trControl=cvControl, method="rpart", tuneLength=10)
predictionsTree <- predict(treeFit, testdata)
confusionMatrix(predictionsTree, testdata$catgeory)

The error occurs when generating the confusion matrix.生成混淆矩阵时发生错误。 The levels are the same on both objects.两个对象上的级别相同。 I cant figure out what the problem is.我无法弄清楚问题是什么。 Their structure and levels are given below.它们的结构和层次如下。 They should be the same.他们应该是一样的。 Any help would be greatly appreciated as its making me cracked!!任何帮助将不胜感激，因为它让我崩溃了！！

> str(predictionsTree)
 Factor w/ 30 levels "16-Merchant Service Charge",..: 28 22 22 22 22 6 6 6 6 6 ...
> str(testdata$category)
 Factor w/ 30 levels "16-Merchant Service Charge",..: 30 30 7 7 7 7 7 30 7 7 ...

> levels(predictionsTree)
 [1] "16-Merchant Service Charge"   "17-Unpaid Cheque Fee"         "18-Gov. Stamp Duty"           "Misc"                         "26-Standard Transfer Charge" 
 [6] "29-Bank Giro Credit"          "3-Cheques Debit"              "32-Standing Order - Debit"    "33-Inter Branch Payment"      "34-International"            
[11] "35-Point of Sale"             "39-Direct Debits Received"    "4-Notified Bank Fees"         "40-Cash Lodged"               "42-International Receipts"   
[16] "46-Direct Debits Paid"        "56-Credit Card Receipts"      "57-Inter Branch"              "58-Unpaid Items"              "59-Inter Company Transfers"  
[21] "6-Notified Interest Credited" "61-Domestic"                  "64-Charge Refund"             "66-Inter Company Transfers"   "67-Suppliers"                
[26] "68-Payroll"                   "69-Domestic"                  "73-Credit Card Payments"      "82-CHAPS Fee"                 "Uncategorised"   

> levels(testdata$category)
 [1] "16-Merchant Service Charge"   "17-Unpaid Cheque Fee"         "18-Gov. Stamp Duty"           "Misc"                         "26-Standard Transfer Charge" 
 [6] "29-Bank Giro Credit"          "3-Cheques Debit"              "32-Standing Order - Debit"    "33-Inter Branch Payment"      "34-International"            
[11] "35-Point of Sale"             "39-Direct Debits Received"    "4-Notified Bank Fees"         "40-Cash Lodged"               "42-International Receipts"   
[16] "46-Direct Debits Paid"        "56-Credit Card Receipts"      "57-Inter Branch"              "58-Unpaid Items"              "59-Inter Company Transfers"  
[21] "6-Notified Interest Credited" "61-Domestic"                  "64-Charge Refund"             "66-Inter Company Transfers"   "67-Suppliers"                
[26] "68-Payroll"                   "69-Domestic"                  "73-Credit Card Payments"      "82-CHAPS Fee"                 "Uncategorised"

Answer 1

Try use:尝试使用：

confusionMatrix(table(Argument 1, Argument 2))

Thats worked for me.那对我有用。

Answer 2

Maybe your model is not predicting a certain factor.也许您的模型没有预测某个因素。 Use the table() function instead of confusionMatrix() to see if that is the problem.使用table()函数而不是confusionMatrix()来查看这是否是问题所在。

Answer 3

尝试为na.action选项指定na.pass ：

predictionsTree <- predict(treeFit, testdata,na.action = na.pass)

Answer 4

Change them into a data frame and then use them in confusionMatrix function:将它们更改为数据框，然后在混淆矩阵函数中使用它们：

pridicted <- factor(predict(treeFit, testdata))
real <- factor(testdata$catgeory)

my_data1 <- data.frame(data = pridicted, type = "prediction")
my_data2 <- data.frame(data = real, type = "real")
my_data3 <- rbind(my_data1,my_data2)

# Check if the levels are identical
identical(levels(my_data3[my_data3$type == "prediction",1]) , levels(my_data3[my_data3$type == "real",1]))

confusionMatrix(my_data3[my_data3$type == "prediction",1], my_data3[my_data3$type == "real",1],  dnn = c("Prediction", "Reference"))

Answer 5

Might be there are missing values in the testdata, Add the following line before "predictionsTree <- predict(treeFit, testdata)" to remove NAs.可能是 testdata 中缺少值，在“predictionsTree <- predict(treeFit, testdata)”之前添加以下行以删除 NA。 I had the same error and now it works for me.我有同样的错误，现在它对我有用。

testdata <- testdata[complete.cases(testdata),]

Answer 6

您遇到的长度问题可能是由于训练集中存在 NAs - 要么删除不完整的案例，要么进行估算以使您没有缺失值。

Answer 7

I had same issue but went ahead and changed it after reading data file like so..我有同样的问题，但在读取数据文件后继续更改它。

data = na.omit(data)

Thanks all for pointer!谢谢大家指点！

Answer 8

make sure you installed the package with all its dependencies:确保您安装了包含所有依赖项的包：

install.packages('caret', dependencies = TRUE)

confusionMatrix( table(prediction, true_value) )

Answer 9

If your data contains NAs then sometimes it will be considered as a factor level,So omit these NAs initially如果您的数据包含 NAs 那么有时它会被视为一个因子水平，所以最初省略这些 NAs

DF = na.omit(DF)

Then,if your model fit is predicting some incorrect level,then it is better to use tables然后，如果您的模型拟合预测的水平不正确，那么最好使用表格

confusionMatrix(table(Arg1, Arg2))

Answer 10

I just ran into the same problem, I solved it by using R ordered factor data type.我刚刚遇到了同样的问题，我通过使用 R 有序因子数据类型解决了它。

levels <- levels(predictionsTree)
levels <- levels[order(levels)]    
table(ordered(predictionsTree,levels), ordered(testdata$catgeory, levels))

ConfusionMatrix 中的误差数据和参考因子必须具有相同的级别数

问题描述

10 个解决方案

解决方案1
22 2018-06-03 11:14:28

解决方案2
5 2014-10-31 05:36:44

解决方案3
3 2015-11-12 03:02:11

解决方案4
2 2018-08-09 05:46:27

解决方案5
0 2015-01-11 07:12:01

解决方案6
0 2015-05-21 21:06:38

解决方案7
0 2015-11-21 18:54:00

解决方案8
0 2019-06-24 19:58:52

解决方案9
0 2019-07-17 13:03:56

解决方案10
0 2020-12-08 19:06:11

ConfusionMatrix 中的误差数据和参考因子必须具有相同的级别数

问题描述

10 个解决方案

解决方案1 22 2018-06-03 11:14:28

解决方案2 5 2014-10-31 05:36:44

解决方案3 3 2015-11-12 03:02:11

解决方案4 2 2018-08-09 05:46:27

解决方案5 0 2015-01-11 07:12:01

解决方案6 0 2015-05-21 21:06:38

解决方案7 0 2015-11-21 18:54:00

解决方案8 0 2019-06-24 19:58:52

解决方案9 0 2019-07-17 13:03:56

解决方案10 0 2020-12-08 19:06:11

解决方案1
22 2018-06-03 11:14:28

解决方案2
5 2014-10-31 05:36:44

解决方案3
3 2015-11-12 03:02:11

解决方案4
2 2018-08-09 05:46:27

解决方案5
0 2015-01-11 07:12:01

解决方案6
0 2015-05-21 21:06:38

解决方案7
0 2015-11-21 18:54:00

解决方案8
0 2019-06-24 19:58:52

解决方案9
0 2019-07-17 13:03:56

解决方案10
0 2020-12-08 19:06:11