ConfusionMatrix 中的誤差數據和參考因子必須具有相同的級別數

Question

我用 R caret 訓練了一個樹模型。 我現在正在嘗試生成一個混淆矩陣並不斷收到以下錯誤：

混淆Matrix.default(predictionsTree, testdata$catgeory) 中的錯誤：數據和參考因子必須具有相同的級別數

prob <- 0.5 #Specify class split
singleSplit <- createDataPartition(modellingData2$category, p=prob,
                                   times=1, list=FALSE)
cvControl <- trainControl(method="repeatedcv", number=10, repeats=5)
traindata <- modellingData2[singleSplit,]
testdata <- modellingData2[-singleSplit,]
treeFit <- train(traindata$category~., data=traindata,
                 trControl=cvControl, method="rpart", tuneLength=10)
predictionsTree <- predict(treeFit, testdata)
confusionMatrix(predictionsTree, testdata$catgeory)

生成混淆矩陣時發生錯誤。 兩個對象上的級別相同。 我無法弄清楚問題是什么。 它們的結構和層次如下。 他們應該是一樣的。 任何幫助將不勝感激，因為它讓我崩潰了！！

> str(predictionsTree)
 Factor w/ 30 levels "16-Merchant Service Charge",..: 28 22 22 22 22 6 6 6 6 6 ...
> str(testdata$category)
 Factor w/ 30 levels "16-Merchant Service Charge",..: 30 30 7 7 7 7 7 30 7 7 ...

> levels(predictionsTree)
 [1] "16-Merchant Service Charge"   "17-Unpaid Cheque Fee"         "18-Gov. Stamp Duty"           "Misc"                         "26-Standard Transfer Charge" 
 [6] "29-Bank Giro Credit"          "3-Cheques Debit"              "32-Standing Order - Debit"    "33-Inter Branch Payment"      "34-International"            
[11] "35-Point of Sale"             "39-Direct Debits Received"    "4-Notified Bank Fees"         "40-Cash Lodged"               "42-International Receipts"   
[16] "46-Direct Debits Paid"        "56-Credit Card Receipts"      "57-Inter Branch"              "58-Unpaid Items"              "59-Inter Company Transfers"  
[21] "6-Notified Interest Credited" "61-Domestic"                  "64-Charge Refund"             "66-Inter Company Transfers"   "67-Suppliers"                
[26] "68-Payroll"                   "69-Domestic"                  "73-Credit Card Payments"      "82-CHAPS Fee"                 "Uncategorised"   

> levels(testdata$category)
 [1] "16-Merchant Service Charge"   "17-Unpaid Cheque Fee"         "18-Gov. Stamp Duty"           "Misc"                         "26-Standard Transfer Charge" 
 [6] "29-Bank Giro Credit"          "3-Cheques Debit"              "32-Standing Order - Debit"    "33-Inter Branch Payment"      "34-International"            
[11] "35-Point of Sale"             "39-Direct Debits Received"    "4-Notified Bank Fees"         "40-Cash Lodged"               "42-International Receipts"   
[16] "46-Direct Debits Paid"        "56-Credit Card Receipts"      "57-Inter Branch"              "58-Unpaid Items"              "59-Inter Company Transfers"  
[21] "6-Notified Interest Credited" "61-Domestic"                  "64-Charge Refund"             "66-Inter Company Transfers"   "67-Suppliers"                
[26] "68-Payroll"                   "69-Domestic"                  "73-Credit Card Payments"      "82-CHAPS Fee"                 "Uncategorised"

Answer 1

嘗試使用：

confusionMatrix(table(Argument 1, Argument 2))

那對我有用。

Answer 2

也許您的模型沒有預測某個因素。 使用table()函數而不是confusionMatrix()來查看這是否是問題所在。

Answer 3

嘗試為na.action選項指定na.pass ：

predictionsTree <- predict(treeFit, testdata,na.action = na.pass)

Answer 4

將它們更改為數據框，然后在混淆矩陣函數中使用它們：

pridicted <- factor(predict(treeFit, testdata))
real <- factor(testdata$catgeory)

my_data1 <- data.frame(data = pridicted, type = "prediction")
my_data2 <- data.frame(data = real, type = "real")
my_data3 <- rbind(my_data1,my_data2)

# Check if the levels are identical
identical(levels(my_data3[my_data3$type == "prediction",1]) , levels(my_data3[my_data3$type == "real",1]))

confusionMatrix(my_data3[my_data3$type == "prediction",1], my_data3[my_data3$type == "real",1],  dnn = c("Prediction", "Reference"))

Answer 5

可能是 testdata 中缺少值，在“predictionsTree <- predict(treeFit, testdata)”之前添加以下行以刪除 NA。 我有同樣的錯誤，現在它對我有用。

testdata <- testdata[complete.cases(testdata),]

Answer 6

您遇到的長度問題可能是由於訓練集中存在 NAs - 要么刪除不完整的案例，要么進行估算以使您沒有缺失值。

Answer 7

我有同樣的問題，但在讀取數據文件后繼續更改它。

data = na.omit(data)

謝謝大家指點！

Answer 8

確保您安裝了包含所有依賴項的包：

install.packages('caret', dependencies = TRUE)

confusionMatrix( table(prediction, true_value) )

Answer 9

如果您的數據包含 NAs 那么有時它會被視為一個因子水平，所以最初省略這些 NAs

DF = na.omit(DF)

然后，如果您的模型擬合預測的水平不正確，那么最好使用表格

confusionMatrix(table(Arg1, Arg2))

Answer 10

我剛剛遇到了同樣的問題，我通過使用 R 有序因子數據類型解決了它。

levels <- levels(predictionsTree)
levels <- levels[order(levels)]    
table(ordered(predictionsTree,levels), ordered(testdata$catgeory, levels))

ConfusionMatrix 中的誤差數據和參考因子必須具有相同的級別數

問題描述

10 個解決方案

解決方案1
22 2018-06-03 11:14:28

解決方案2
5 2014-10-31 05:36:44

解決方案3
3 2015-11-12 03:02:11

解決方案4
2 2018-08-09 05:46:27

解決方案5
0 2015-01-11 07:12:01

解決方案6
0 2015-05-21 21:06:38

解決方案7
0 2015-11-21 18:54:00

解決方案8
0 2019-06-24 19:58:52

解決方案9
0 2019-07-17 13:03:56

解決方案10
0 2020-12-08 19:06:11

ConfusionMatrix 中的誤差數據和參考因子必須具有相同的級別數

問題描述

10 個解決方案

解決方案1 22 2018-06-03 11:14:28

解決方案2 5 2014-10-31 05:36:44

解決方案3 3 2015-11-12 03:02:11

解決方案4 2 2018-08-09 05:46:27

解決方案5 0 2015-01-11 07:12:01

解決方案6 0 2015-05-21 21:06:38

解決方案7 0 2015-11-21 18:54:00

解決方案8 0 2019-06-24 19:58:52

解決方案9 0 2019-07-17 13:03:56

解決方案10 0 2020-12-08 19:06:11

解決方案1
22 2018-06-03 11:14:28

解決方案2
5 2014-10-31 05:36:44

解決方案3
3 2015-11-12 03:02:11

解決方案4
2 2018-08-09 05:46:27

解決方案5
0 2015-01-11 07:12:01

解決方案6
0 2015-05-21 21:06:38

解決方案7
0 2015-11-21 18:54:00

解決方案8
0 2019-06-24 19:58:52

解決方案9
0 2019-07-17 13:03:56

解決方案10
0 2020-12-08 19:06:11