混淆矩陣錯誤：數據和參考因素必須具有相同的水平數

Question

我已經用 R 插入符訓練了一個線性回歸模型。 我現在正在嘗試生成一個混淆矩陣並不斷收到以下錯誤：

混淆Matrix.default(pred, testing$Final) 中的錯誤：數據和參考因子必須具有相同的級別數

EnglishMarks <- read.csv("E:/Subject Wise Data/EnglishMarks.csv", 
header=TRUE)
inTrain<-createDataPartition(y=EnglishMarks$Final,p=0.7,list=FALSE)
training<-EnglishMarks[inTrain,]
testing<-EnglishMarks[-inTrain,]
predictionsTree <- predict(treeFit, testdata)
confusionMatrix(predictionsTree, testdata$catgeory)
modFit<-train(Final~UT1+UT2+HalfYearly+UT3+UT4,method="lm",data=training)
pred<-format(round(predict(modFit,testing)))              
confusionMatrix(pred,testing$Final)

生成混淆矩陣時發生錯誤。 兩個對象的級別相同。 我無法弄清楚問題是什么。 它們的結構和層次如下。 他們應該是一樣的。 任何幫助將不勝感激，因為它讓我崩潰了！！

> str(pred)
chr [1:148] "85" "84" "87" "65" "88" "84" "82" "84" "65" "78" "78" "88" "85"  
"86" "77" ...
> str(testing$Final)
int [1:148] 88 85 86 70 85 85 79 85 62 77 ...

> levels(pred)
NULL
> levels(testing$Final)
NULL

Answer 1

做table(pred)和table(testing$Final) 。 您將看到測試集中至少有一個數字從未被預測過（即從未出現在pred ）。 這就是為什么“級別數不同”的意思。 還有就是要解決這個問題，一個定制函數的例子在這里。

但是，我發現這個技巧很好用：

table(factor(pred, levels=min(test):max(test)), 
      factor(test, levels=min(test):max(test)))

它應該為您提供與函數完全相同的混淆矩陣。

Answer 2

confusionMatrix(pred,testing$Final)

每當您嘗試構建混淆矩陣時，請確保真實值和預測值都是因子數據類型。

這里 pred 和testing$Final必須是factor類型。 與其檢查水平，不如檢查兩個變量的類型，如果不是，則將它們轉換為因子。

這里testing$final是int類型。 將其轉換為因子，然后構建混淆矩陣。

Answer 3

我遇到過同樣的問題。 我猜這是因為數據參數沒有像我預期的那樣被轉換為因素。 嘗試：

confusionMatrix(pred,as.factor(testing$Final))

希望能幫助到你

Answer 4

像下面這樣的東西似乎對我有用。 這個想法類似於@nayriz 的想法：

confusionMatrix(
  factor(pred, levels = 1:148),
  factor(testing$Final, levels = 1:148)
)

關鍵是確保因子水平匹配。

Answer 5

在類似的錯誤中，我強制 GLM 預測與因變量具有相同的類。

例如，GLM 將預測“數字”類。 但是由於目標變量是一個“因子”類，我遇到了一個錯誤。

錯誤代碼：

#Predicting using logistic model
glm.probs = predict(model_glm, newdata = test, type = "response")
test$pred_glm = ifelse(glm.probs > 0.5, "1", "0")


#Checking the accuracy of the logistic model
    confusionMatrix(test$default,test$pred_glm)

結果：

Error: `data` and `reference` should be factors with the same levels.

更正的代碼：

#Predicting using logistic model
    glm.probs = predict(model_glm, newdata = test, type = "response")
    test$pred_glm = ifelse(glm.probs > 0.5, "1", "0")
    test$pred_glm = as.factor(test$pred_glm)
    
#Checking the accuracy of the logistic model
confusionMatrix(test$default,test$pred_glm)

結果：

confusion Matrix and Statistics

          Reference
Prediction     0     1
         0   182  1317
         1   122 22335
                                          
               Accuracy : 0.9399          
                 95% CI : (0.9368, 0.9429)
    No Information Rate : 0.9873          
    P-Value [Acc > NIR] : 1

Answer 6

由於數據集中目標變量的 NA，我遇到了這個問題。 如果您使用tidyverse ，則可以使用drop_na函數刪除包含 NA 的行。 像這樣：

iris %>% drop_na(Species) # Removes rows where Species column has NA
iris %>% drop_na() # Removes rows where any column has NA

對於基礎 R，它可能看起來像：

iris[! is.na(iris$Species), ] # Removes rows where Species column has NA
na.omit(iris) # Removes rows where any column has NA

Answer 7

我們在創建混淆矩陣時收到此錯誤。 在創建混淆矩陣時，我們需要確保數據類型的預測值和實際值是“因子”。 如果有其他數據類型，我們必須在生成混淆矩陣之前將它們轉換為“因子”數據因子。 在此轉換之后，開始編譯混淆矩陣。

pridicted <- factor(predict(treeFit, testdata))
real <- factor(testdata$catgeory)
my_data1 <- data.frame(data = pridicted, type = "prediction")
my_data2 <- data.frame(data = real, type = "real"
my_data3 <- rbind(my_data1,my_data2)
# Check if the levels are identical
identical(levels(my_data3[my_data3$type == "prediction",1]) , 
levels(my_data3[my_data3$type == "real",1]))
confusionMatrix(my_data3[my_data3$type == "prediction",1], 
my_data3[my_data3$type == "real",1],  dnn = c("Prediction", "Reference"))

我從這里拿走了這個

Answer 8

您正在使用回歸並嘗試生成混淆矩陣。 我相信混淆矩陣用於分類任務。 通常人們使用 R^2 和 RMSE 指標。

混淆矩陣錯誤：數據和參考因素必須具有相同的水平數

問題描述

8 個解決方案

解決方案1
11 2015-05-10 04:25:53

解決方案2
11 2018-07-31 09:36:58

解決方案3
8 2019-04-08 01:08:42

解決方案4
4 2018-04-30 20:57:48

解決方案5
2 2021-01-28 17:26:07

解決方案6
0 2020-12-11 17:00:29

解決方案7
0 2021-12-08 14:25:08

解決方案8
-4 2019-01-01 02:32:53

混淆矩陣錯誤：數據和參考因素必須具有相同的水平數

問題描述

8 個解決方案

解決方案1 11 2015-05-10 04:25:53

解決方案2 11 2018-07-31 09:36:58

解決方案3 8 2019-04-08 01:08:42

解決方案4 4 2018-04-30 20:57:48

解決方案5 2 2021-01-28 17:26:07

解決方案6 0 2020-12-11 17:00:29

解決方案7 0 2021-12-08 14:25:08

解決方案8 -4 2019-01-01 02:32:53

解決方案1
11 2015-05-10 04:25:53

解決方案2
11 2018-07-31 09:36:58

解決方案3
8 2019-04-08 01:08:42

解決方案4
4 2018-04-30 20:57:48

解決方案5
2 2021-01-28 17:26:07

解決方案6
0 2020-12-11 17:00:29

解決方案7
0 2021-12-08 14:25:08

解決方案8
-4 2019-01-01 02:32:53