繁体   English   中英

Adaboost:混淆矩阵的问题 - `data` 和 `reference` 应该是具有相同水平的因素

[英]Adaboost: Problem with confusion matrix - `data` and `reference` should be factors with the same levels

我是 ML 新手,我的混淆矩阵有问题。 不幸的是,我有这个错误(生成混淆矩阵时发生错误。):

datareference应该是同一水平的因素。

这是我的代码:

library(caret)
library(fastAdaboost)

data <- read.csv('~/Desktop/test1.csv', sep = ";")
data1 <- subset(data,select=c(4,5,6,7,8,12,15,16))

set.seed(1234)
parts = createDataPartition(data1$Status.szkody, p = 0.7, list = F)
train = data1[parts, ]
test = data1[-parts, ]

model <- adaboost(Status.szkody ~., data = train,6)

a <- predict(model, train, type = "class")

train$Status.szkody = as.factor(train$Status.szkody)
confusionMatrix(a,train$Status.szkody, mode = "everything")

我看到“train$Status.szkody”有一个级别,一个“a”没有,但是如何处理呢?

> str(a)
List of 5
 $ formula:Class 'formula'  language Status.szkody ~ .
  .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv> 
 $ votes  : num [1:40845, 1:2] 1.14 1.77 1.59 1.35 1.77 ...
 $ class  : Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
 $ prob   : num [1:40845, 1:2] 0.644 1 0.9 0.762 1 ...
 $ error  : num 0.234
> str(train$Status.szkody)
 Factor w/ 2 levels "0","1": 1 1 1 1 1 1 1 1 1 1 ...
> levels(a)
NULL
> levels(train$Status.szkody)
[1] "0" "1"

此外,我尝试使用“cvms::confusion_matrix(train$Status.szkody,a)”,但出现错误 - 'targets' 和 'predictions' 必须具有相同的长度。

任何帮助将不胜感激,因为我不知道如何处理它。 提前致谢。

编辑1:

dput(head(data1,30))

structure(list(Miesiąc = c("styczeń", "luty", "styczeń", "styczeń", 
"styczeń", "styczeń", "styczeń", "styczeń", "styczeń", "styczeń", 
"styczeń", "styczeń", "styczeń", "styczeń", "styczeń", "styczeń", 
"styczeń", "styczeń", "styczeń", "styczeń", "styczeń", "styczeń", 
"styczeń", "styczeń", "styczeń", "styczeń", "styczeń", "styczeń", 
"styczeń", "styczeń"), Kwartał = c(1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), Terminal = c("Katowice", "Legnica", 
"Katowice", "Legnica", "Sosnowiec", "Wrocław", "Legnica", "Katowice", 
"Katowice", "Legnica", "Gliwice", "Wrocław", "Wrocław", "Legnica", 
"Wrocław", "Legnica", "Sosnowiec", "Wrocław", "Katowice", "Gliwice", 
"Gliwice", "Gliwice", "Katowice", "Wrocław", "Legnica", "Legnica", 
"Gliwice", "Legnica", "Katowice", "Legnica"), Towar = c("RTV", 
"RTV", "Telefony", "AGD", "Komputery", "AGD małe", "AGD do zabudowy", 
"Telefony", "RTV", "AGD małe", "AGD", "RTV", "Komputery", "AGD małe", 
"RTV", "AGD do zabudowy", "RTV", "Komputery", "Telefony", "Komputery", 
"RTV", "AGD małe", "AGD małe", "AGD", "Telefony", "Telefony", 
"AGD małe", "AGD do zabudowy", "AGD do zabudowy", "AGD do zabudowy"
), Status.szkody = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 
1L, 0L, 0L, 0L), Kraj = c("PL", "PL", "PL", "PL", "PL", "PL", 
"PL", "PL", "PL", "DE", "DE", "DE", "DE", "PL", "PL", "PL", "PL", 
"PL", "PL", "DE", "DE", "DE", "DE", "DE", "DE", "AT", "DE", "DE", 
"AT", "DE"), Usługa = c("Express", "Express", "Express", "Express", 
"Express", "Express", "Express", "Express", "Express", "Express", 
"Express", "Express", "Express", "Express", "Express", "Express", 
"Express", "Express", "Express", "Express", "Express", "Express", 
"Express", "Express", "Express", "Express", "Express", "Express", 
"Express", "Express"), Partner = c("Partner D", "Partner A", 
"Partner D", "Partner A", "Partner C", "Partner D", "Partner D", 
"Partner A", "Partner D", "Partner B", "Partner C", "Partner A", 
"Partner C", "Partner B", "Partner D", "Partner B", "Partner D", 
"Partner E", "Partner B", "Partner D", "Partner E", "Partner D", 
"Partner E", "Partner B", "Partner D", "Partner D", "Partner C", 
"Partner A", "Partner E", "Partner B")), row.names = c(NA, 30L
), class = "data.frame")

您应该使用a$class ,它是您在向量中对模型的预测。 您可以使用以下代码:

library(caret)
library(fastAdaboost)

set.seed(1234)
parts = createDataPartition(data1$Status.szkody, p = 0.7, list = F)
train = data1[parts, ]
test = data1[-parts, ]

model <- adaboost(Status.szkody ~., data = train,6)

a <- predict(model, train, type = "class")

train$Status.szkody = as.factor(train$Status.szkody)
confusionMatrix(a$class,train$Status.szkody, mode = "everything")
#> Warning in confusionMatrix.default(a$class, train$Status.szkody, mode =
#> "everything"): Levels are not in the same order for reference and data.
#> Refactoring data to match.
#> Confusion Matrix and Statistics
#> 
#>           Reference
#> Prediction  0  1
#>          0 20  1
#>          1  0  0
#>                                           
#>                Accuracy : 0.9524          
#>                  95% CI : (0.7618, 0.9988)
#>     No Information Rate : 0.9524          
#>     P-Value [Acc > NIR] : 0.7358          
#>                                           
#>                   Kappa : 0               
#>                                           
#>  Mcnemar's Test P-Value : 1.0000          
#>                                           
#>             Sensitivity : 1.0000          
#>             Specificity : 0.0000          
#>          Pos Pred Value : 0.9524          
#>          Neg Pred Value :    NaN          
#>               Precision : 0.9524          
#>                  Recall : 1.0000          
#>                      F1 : 0.9756          
#>              Prevalence : 0.9524          
#>          Detection Rate : 0.9524          
#>    Detection Prevalence : 1.0000          
#>       Balanced Accuracy : 0.5000          
#>                                           
#>        'Positive' Class : 0               
#> 

reprex 包(v2.0.1)于 2022-07-23 创建

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM