[英]Error when plotting multiclass ROC curve in R
I have made an SVM predictor, which can class samples into one of three groups - "good", "bad" or "ok".我制作了一个 SVM 预测器,它可以将 class 样本分为三组之一——“好”、“坏”或“好”。 However, the test dataset only contains samples classed as "good" or "bad".
但是,测试数据集仅包含分类为“好”或“坏”的样本。 I'm coming up with an error when I'm trying to use
multi_roc
, and I'm not sure the best way to solve it.当我尝试使用
multi_roc
时出现错误,我不确定解决它的最佳方法。 The example I've made is below:我做的例子如下:
library(tidymodels)
library(mlbench)
library(multiROC)
data(Ionosphere)
# preprocess dataset
Ionosphere <- Ionosphere %>% select(-V1, -V2)
# split into training and test data
ion_split <- initial_split(Ionosphere, prop = 3/5)
ion_train <- training(ion_split)
ion_test <- testing(ion_split)
# making an artificial third class in the training set for this example
ion_train[,33] <- as.character(ion_train[,33])
ion_train[1:7,33] <- "ok"
ion_train[,33] <- as.factor(ion_train[,33])
# make a recipe
iono_rec <-
recipe(Class ~ ., data = ion_train) %>%
step_normalize(all_predictors())
# build the model and workflow
svm_mod <-
svm_rbf(cost = tune(), rbf_sigma = tune()) %>%
set_mode("classification") %>%
set_engine("kernlab")
svm_workflow <-
workflow() %>%
add_recipe(iono_rec) %>%
add_model(svm_mod)
# run model tuning
set.seed(35)
recipe_res <-
svm_workflow %>%
tune_grid(
resamples = bootstraps(ion_train, times = 2),
metrics = metric_set(roc_auc),
control = control_grid(verbose = TRUE, save_pred = TRUE)
)
# chose best model, finalise workflow
best_mod <- recipe_res %>% select_best("roc_auc")
final_wf <- finalize_workflow(svm_workflow, best_mod)
final_mod <- final_wf %>% fit(ion_train)
predict_res <- predict(
final_mod,
ion_test,
type = "prob")
results <- predict_res %>%
cbind(ion_test$Class) %>%
dplyr::rename(
bad_pred_svm = .pred_bad,
good_pred_svm = .pred_good,
ok_pred_svm = .pred_ok,
class = `ion_test$Class`
) %>%
mutate(
bad_true = ifelse(class == "bad", 1, 0),
good_true = ifelse(class == "good", 1, 0),
ok_true = ifelse(class == "ok", 1, 0)
) %>%
dplyr::select(-class)
This produces a results dataframe that looks like this:这会产生一个结果 dataframe,如下所示:
bad_pred_svm good_pred_svm ok_pred_svm bad_true good_true ok_true
1 0.01166109 0.92349066 0.06484826 0 1 0
2 0.82937620 0.07576908 0.09485472 1 0 0
3 0.05858563 0.88043189 0.06098248 0 1 0
4 0.91602211 0.04624037 0.03773753 1 0 0
5 0.91841475 0.04407115 0.03751410 1 0 0
6 0.01014520 0.94295540 0.04689940 0 1 0
When I try and put this into multi_roc, I get an error:当我尝试将其放入 multi_roc 时,出现错误:
multi_roc_svm <- multi_roc(results, force_diag = TRUE)
Error in approx(res_sp[[i]][[j]], res_se[[i]][[j]], all_sp, yleft = 1, :
need at least two non-NA values to interpolate
In addition: Warning messages:
1: In regularize.values(x, y, ties, missing(ties), na.rm = na.rm) :
collapsing to unique 'x' values
2: In regularize.values(x, y, ties, missing(ties), na.rm = na.rm) :
collapsing to unique 'x' value
I'm 99% sure this error is because I do not have any samples of "ok" class in my test data frame, but I don't know how to get around this.我 99% 确定这个错误是因为我的测试数据框中没有任何“ok”class 样本,但我不知道如何解决这个问题。 Could I plot the multi ROC curve by hand?
我可以手动 plot 多 ROC 曲线吗?
I don't know what package multi_roc()
is in but the tidymodels solution is pretty easy.我不知道 package
multi_roc()
是什么,但 tidymodels 解决方案非常简单。
If you just want to get the ROC value from the multiclass ROC curve, you can use the yardstick
function:如果只是想从多类ROC曲线中得到ROC值,可以使用
yardstick
function:
> predict_res %>%
+ bind_cols(ion_test) %>%
+ # or roc_curve(Class, .pred_bad)
+ roc_auc(Class, .pred_bad)
# A tibble: 1 x 3
.metric .estimator .estimate
<chr> <chr> <dbl>
1 roc_auc binary 0.976
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.