[英]Is there a quicker way to make this confusion matrix table in R?
我正在嘗試使用以下 dataframe 在 R 中制作混淆矩陣表:
mydf <- structure(list(pred_class = c("dog", "dog", "fish", "cat", "cat",
"dog", "fish", "cat", "dog", "fish"), true_class = c("cat", "cat",
"dog", "cat", "cat", "dog", "dog", "cat", "dog", "fish")), row.names = c(NA,
10L), class = "data.frame")
pred_class true_class
1 dog cat
2 dog cat
3 fish dog
4 cat cat
5 cat cat
6 dog dog
我已經生成了代碼來做我想做的事——對於每個 class(狗、貓或魚),說每一行是真陽性、假陽性、真陰性還是假陰性。
conf_mat <- mydf %>%
mutate(
dog_conf = case_when(
true_class == "dog" & pred_class == "dog" ~ "TP",
true_class == "dog" & pred_class != "dog" ~ "FN",
true_class != "dog" & pred_class == "dog" ~ "FP",
true_class != "dog" & pred_class != "dog" ~ "TN"
),
cat_conf = case_when(
true_class == "cat" & pred_class == "cat" ~ "TP",
true_class == "cat" & pred_class != "cat" ~ "FN",
true_class != "cat" & pred_class == "cat" ~ "FP",
true_class != "cat" & pred_class != "cat" ~ "TN"
),
fish_conf = case_when(
true_class == "fish" & pred_class == "fish" ~ "TP",
true_class == "fish" & pred_class != "fish" ~ "FN",
true_class != "fish" & pred_class == "fish" ~ "FP",
true_class != "fish" & pred_class != "fish" ~ "TN"
)
)
但是,此代碼非常重復且龐大。 我不確定如何簡化這一點。 有沒有人有什么建議? 謝謝你。
這是map
的一個選項,我們在其中循環數據集的唯一元素,根據 OP 帖子中指定的條件在循環中創建帶有transmute
的列,並將這些列與原始數據綁定
library(dplyr)
library(purrr)
library(stringr)
map_dfc(unique(unlist(mydf)), ~
mydf %>%
transmute(!! str_c(.x, '_conf') :=
case_when(true_class == .x & pred_class == .x ~ "TP",
true_class == .x & pred_class != .x ~ "FN",
true_class != .x & pred_class == .x ~ "FP",
true_class != .x & pred_class != .x ~ "TN"
))) %>%
bind_cols(mydf, .)
-輸出
# pred_class true_class dog_conf cat_conf fish_conf
#1 dog cat FP FN TN
#2 dog cat FP FN TN
#3 fish dog FN TN FP
#4 cat cat TN TP TN
#5 cat cat TN TP TN
#6 dog dog TP TN TN
#7 fish dog FN TN FP
#8 cat cat TN TP TN
#9 dog dog TP TN TN
#10 fish fish TN TN TP
或者在 key val 數據集上使用merge
keydat <- data.frame(pred_class = c(TRUE, TRUE, FALSE, FALSE),
true_class = c(TRUE, FALSE, TRUE, FALSE),
conf = c("TP", "FN", "FP", "TN"))
un1 <- unique(unlist(mydf))
mydf[paste0(un1, "_conf")] <- lapply(un1, function(x)
merge(mydf == x, keydat, all.x = TRUE)$conf)
除了@akrun 的出色回答,如果您希望確定每個預測的狀態(TP/TN/FP/FN)以計算其他統計/指標,其中許多可以由插入符號 package提供,例如
library(caret)
mydf <- structure(list(pred_class = c("dog", "dog", "fish", "cat", "cat",
"dog", "fish", "cat", "dog", "fish"), true_class = c("cat", "cat",
"dog", "cat", "cat", "dog", "dog", "cat", "dog", "fish")), row.names = c(NA,
10L), class = "data.frame")
conf_matrix <- confusionMatrix(factor(mydf$pred_class),
reference = factor(mydf$true_class),
mode = "everything")
conf_matrix
#> Confusion Matrix and Statistics
#>
#> Reference
#> Prediction cat dog fish
#> cat 3 0 0
#> dog 2 2 0
#> fish 0 2 1
#>
#> Overall Statistics
#>
#> Accuracy : 0.6
#> 95% CI : (0.2624, 0.8784)
#> No Information Rate : 0.5
#> P-Value [Acc > NIR] : 0.377
#>
#> Kappa : 0.3939
#>
#> Mcnemar's Test P-Value : NA
#>
#> Statistics by Class:
#>
#> Class: cat Class: dog Class: fish
#> Sensitivity 0.6000 0.5000 1.0000
#> Specificity 1.0000 0.6667 0.7778
#> Pos Pred Value 1.0000 0.5000 0.3333
#> Neg Pred Value 0.7143 0.6667 1.0000
#> Precision 1.0000 0.5000 0.3333
#> Recall 0.6000 0.5000 1.0000
#> F1 0.7500 0.5000 0.5000
#> Prevalence 0.5000 0.4000 0.1000
#> Detection Rate 0.3000 0.2000 0.1000
#> Detection Prevalence 0.3000 0.4000 0.3000
#> Balanced Accuracy 0.8000 0.5833 0.8889
進一步說明:
對於帶有符號的 2x2 表
Reference
Predicted Event No Event
Event A B
No Event C D
當“A”=TP、“B”=FP、“C”=FN、“D”=TN時,包/函數使用的公式為:
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.