[英]Is there a quicker way to make this confusion matrix table in R?
我正在尝试使用以下 dataframe 在 R 中制作混淆矩阵表:
mydf <- structure(list(pred_class = c("dog", "dog", "fish", "cat", "cat",
"dog", "fish", "cat", "dog", "fish"), true_class = c("cat", "cat",
"dog", "cat", "cat", "dog", "dog", "cat", "dog", "fish")), row.names = c(NA,
10L), class = "data.frame")
pred_class true_class
1 dog cat
2 dog cat
3 fish dog
4 cat cat
5 cat cat
6 dog dog
我已经生成了代码来做我想做的事——对于每个 class(狗、猫或鱼),说每一行是真阳性、假阳性、真阴性还是假阴性。
conf_mat <- mydf %>%
mutate(
dog_conf = case_when(
true_class == "dog" & pred_class == "dog" ~ "TP",
true_class == "dog" & pred_class != "dog" ~ "FN",
true_class != "dog" & pred_class == "dog" ~ "FP",
true_class != "dog" & pred_class != "dog" ~ "TN"
),
cat_conf = case_when(
true_class == "cat" & pred_class == "cat" ~ "TP",
true_class == "cat" & pred_class != "cat" ~ "FN",
true_class != "cat" & pred_class == "cat" ~ "FP",
true_class != "cat" & pred_class != "cat" ~ "TN"
),
fish_conf = case_when(
true_class == "fish" & pred_class == "fish" ~ "TP",
true_class == "fish" & pred_class != "fish" ~ "FN",
true_class != "fish" & pred_class == "fish" ~ "FP",
true_class != "fish" & pred_class != "fish" ~ "TN"
)
)
但是,此代码非常重复且庞大。 我不确定如何简化这一点。 有没有人有什么建议? 谢谢你。
这是map
的一个选项,我们在其中循环数据集的唯一元素,根据 OP 帖子中指定的条件在循环中创建带有transmute
的列,并将这些列与原始数据绑定
library(dplyr)
library(purrr)
library(stringr)
map_dfc(unique(unlist(mydf)), ~
mydf %>%
transmute(!! str_c(.x, '_conf') :=
case_when(true_class == .x & pred_class == .x ~ "TP",
true_class == .x & pred_class != .x ~ "FN",
true_class != .x & pred_class == .x ~ "FP",
true_class != .x & pred_class != .x ~ "TN"
))) %>%
bind_cols(mydf, .)
-输出
# pred_class true_class dog_conf cat_conf fish_conf
#1 dog cat FP FN TN
#2 dog cat FP FN TN
#3 fish dog FN TN FP
#4 cat cat TN TP TN
#5 cat cat TN TP TN
#6 dog dog TP TN TN
#7 fish dog FN TN FP
#8 cat cat TN TP TN
#9 dog dog TP TN TN
#10 fish fish TN TN TP
或者在 key val 数据集上使用merge
keydat <- data.frame(pred_class = c(TRUE, TRUE, FALSE, FALSE),
true_class = c(TRUE, FALSE, TRUE, FALSE),
conf = c("TP", "FN", "FP", "TN"))
un1 <- unique(unlist(mydf))
mydf[paste0(un1, "_conf")] <- lapply(un1, function(x)
merge(mydf == x, keydat, all.x = TRUE)$conf)
除了@akrun 的出色回答,如果您希望确定每个预测的状态(TP/TN/FP/FN)以计算其他统计/指标,其中许多可以由插入符号 package提供,例如
library(caret)
mydf <- structure(list(pred_class = c("dog", "dog", "fish", "cat", "cat",
"dog", "fish", "cat", "dog", "fish"), true_class = c("cat", "cat",
"dog", "cat", "cat", "dog", "dog", "cat", "dog", "fish")), row.names = c(NA,
10L), class = "data.frame")
conf_matrix <- confusionMatrix(factor(mydf$pred_class),
reference = factor(mydf$true_class),
mode = "everything")
conf_matrix
#> Confusion Matrix and Statistics
#>
#> Reference
#> Prediction cat dog fish
#> cat 3 0 0
#> dog 2 2 0
#> fish 0 2 1
#>
#> Overall Statistics
#>
#> Accuracy : 0.6
#> 95% CI : (0.2624, 0.8784)
#> No Information Rate : 0.5
#> P-Value [Acc > NIR] : 0.377
#>
#> Kappa : 0.3939
#>
#> Mcnemar's Test P-Value : NA
#>
#> Statistics by Class:
#>
#> Class: cat Class: dog Class: fish
#> Sensitivity 0.6000 0.5000 1.0000
#> Specificity 1.0000 0.6667 0.7778
#> Pos Pred Value 1.0000 0.5000 0.3333
#> Neg Pred Value 0.7143 0.6667 1.0000
#> Precision 1.0000 0.5000 0.3333
#> Recall 0.6000 0.5000 1.0000
#> F1 0.7500 0.5000 0.5000
#> Prevalence 0.5000 0.4000 0.1000
#> Detection Rate 0.3000 0.2000 0.1000
#> Detection Prevalence 0.3000 0.4000 0.3000
#> Balanced Accuracy 0.8000 0.5833 0.8889
进一步说明:
对于带有符号的 2x2 表
Reference
Predicted Event No Event
Event A B
No Event C D
当“A”=TP、“B”=FP、“C”=FN、“D”=TN时,包/函数使用的公式为:
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.