簡體   English   中英

ConfusionMatrix 中的誤差數據和參考因子必須具有相同的級別數

[英]Error in ConfusionMatrix the data and reference factors must have the same number of levels

我用 R caret 訓練了一個樹模型。 我現在正在嘗試生成一個混淆矩陣並不斷收到以下錯誤:

混淆Matrix.default(predictionsTree, testdata$catgeory) 中的錯誤:數據和參考因子必須具有相同的級別數

prob <- 0.5 #Specify class split
singleSplit <- createDataPartition(modellingData2$category, p=prob,
                                   times=1, list=FALSE)
cvControl <- trainControl(method="repeatedcv", number=10, repeats=5)
traindata <- modellingData2[singleSplit,]
testdata <- modellingData2[-singleSplit,]
treeFit <- train(traindata$category~., data=traindata,
                 trControl=cvControl, method="rpart", tuneLength=10)
predictionsTree <- predict(treeFit, testdata)
confusionMatrix(predictionsTree, testdata$catgeory)

生成混淆矩陣時發生錯誤。 兩個對象上的級別相同。 我無法弄清楚問題是什么。 它們的結構和層次如下。 他們應該是一樣的。 任何幫助將不勝感激,因為它讓我崩潰了!!

> str(predictionsTree)
 Factor w/ 30 levels "16-Merchant Service Charge",..: 28 22 22 22 22 6 6 6 6 6 ...
> str(testdata$category)
 Factor w/ 30 levels "16-Merchant Service Charge",..: 30 30 7 7 7 7 7 30 7 7 ...

> levels(predictionsTree)
 [1] "16-Merchant Service Charge"   "17-Unpaid Cheque Fee"         "18-Gov. Stamp Duty"           "Misc"                         "26-Standard Transfer Charge" 
 [6] "29-Bank Giro Credit"          "3-Cheques Debit"              "32-Standing Order - Debit"    "33-Inter Branch Payment"      "34-International"            
[11] "35-Point of Sale"             "39-Direct Debits Received"    "4-Notified Bank Fees"         "40-Cash Lodged"               "42-International Receipts"   
[16] "46-Direct Debits Paid"        "56-Credit Card Receipts"      "57-Inter Branch"              "58-Unpaid Items"              "59-Inter Company Transfers"  
[21] "6-Notified Interest Credited" "61-Domestic"                  "64-Charge Refund"             "66-Inter Company Transfers"   "67-Suppliers"                
[26] "68-Payroll"                   "69-Domestic"                  "73-Credit Card Payments"      "82-CHAPS Fee"                 "Uncategorised"   

> levels(testdata$category)
 [1] "16-Merchant Service Charge"   "17-Unpaid Cheque Fee"         "18-Gov. Stamp Duty"           "Misc"                         "26-Standard Transfer Charge" 
 [6] "29-Bank Giro Credit"          "3-Cheques Debit"              "32-Standing Order - Debit"    "33-Inter Branch Payment"      "34-International"            
[11] "35-Point of Sale"             "39-Direct Debits Received"    "4-Notified Bank Fees"         "40-Cash Lodged"               "42-International Receipts"   
[16] "46-Direct Debits Paid"        "56-Credit Card Receipts"      "57-Inter Branch"              "58-Unpaid Items"              "59-Inter Company Transfers"  
[21] "6-Notified Interest Credited" "61-Domestic"                  "64-Charge Refund"             "66-Inter Company Transfers"   "67-Suppliers"                
[26] "68-Payroll"                   "69-Domestic"                  "73-Credit Card Payments"      "82-CHAPS Fee"                 "Uncategorised"       

嘗試使用:

confusionMatrix(table(Argument 1, Argument 2)) 

那對我有用。

也許您的模型沒有預測某個因素。 使用table()函數而不是confusionMatrix()來查看這是否是問題所在。

嘗試為na.action選項指定na.pass

predictionsTree <- predict(treeFit, testdata,na.action = na.pass)

將它們更改為數據框,然后在混淆矩陣函數中使用它們:

pridicted <- factor(predict(treeFit, testdata))
real <- factor(testdata$catgeory)

my_data1 <- data.frame(data = pridicted, type = "prediction")
my_data2 <- data.frame(data = real, type = "real")
my_data3 <- rbind(my_data1,my_data2)

# Check if the levels are identical
identical(levels(my_data3[my_data3$type == "prediction",1]) , levels(my_data3[my_data3$type == "real",1]))

confusionMatrix(my_data3[my_data3$type == "prediction",1], my_data3[my_data3$type == "real",1],  dnn = c("Prediction", "Reference"))

可能是 testdata 中缺少值,在“predictionsTree <- predict(treeFit, testdata)”之前添加以下行以刪除 NA。 我有同樣的錯誤,現在它對我有用。

testdata <- testdata[complete.cases(testdata),]

您遇到的長度問題可能是由於訓練集中存在 NAs - 要么刪除不完整的案例,要么進行估算以使您沒有缺失值。

我有同樣的問題,但在讀取數據文件后繼續更改它。

data = na.omit(data)

謝謝大家指點!

確保您安裝了包含所有依賴項的包:

install.packages('caret', dependencies = TRUE)

confusionMatrix( table(prediction, true_value) )

如果您的數據包含 NAs 那么有時它會被視為一個因子水平,所以最初省略這些 NAs

DF = na.omit(DF)

然后,如果您的模型擬合預測的水平不正確,那么最好使用表格

confusionMatrix(table(Arg1, Arg2))

我剛剛遇到了同樣的問題,我通過使用 R 有序因子數據類型解決了它。

levels <- levels(predictionsTree)
levels <- levels[order(levels)]    
table(ordered(predictionsTree,levels), ordered(testdata$catgeory, levels))

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM