R：插入符号 package：布赖尔分数

Question

I want to perform a logistic regression with the train() function from the caret package.我想使用caret package 中的train() function 执行逻辑回归。 My model looks something like that:我的 model 看起来像这样：

model <- train(Y ~.,
  data = train_data,
  family = "binomial",
  method = "glmnet")

With the resulting model, I want to make predictions:使用生成的 model，我想做出预测：

pred <- predict(model, newdata = test_data, s = "lambda.min", type = "prob")

Now, I want to evaluate how good the model predictions are in comparison with the actual test data.现在，我想评估 model 预测与实际测试数据相比有多好。 For this I know how to receive the ROC and AUC.为此，我知道如何接收 ROC 和 AUC。 However I am also interested in receiveing the BRIER SCORE.不过，我也有兴趣获得 BRIER SCORE。 The formula for the Brier Score is almost identical to the MSE. Brier 分数的公式几乎与 MSE 相同。 The problem I am facing, is that the type argument in predict only allows "prob" (or "class" which I am not interested in) which gives the probability of one prediction beeing a ONE (eg 0.64), and the complementing probability of beeing a ZERO (eg 0.37).我面临的问题是 predict 中的type参数只允许“概率”（或我不感兴趣的“类”），它给出了一个预测为 ONE 的概率（例如 0.64），以及补充概率蜜蜂为零（例如 0.37）。 For the Brier Score however, I need One probability estimate for each prediction that contains the information of both (eg a value above 0.5 would indicate a 1 and a value below 0.5 would indicate a 0).然而，对于 Brier 分数，我需要一个包含两者信息的预测的概率估计（例如，高于 0.5 的值表示 1，低于 0.5 的值表示 0）。 I have not found any solution for receiving the Brier Score in the caret package.我还没有找到在caret package 中接收 Brier 分数的任何解决方案。 I am aware that with the package cv.glmnet the predict function allows the argument "response" which would solve my problem.我知道使用 package cv.glmnet predict function 允许参数“响应”，这将解决我的问题。 However, for personal preferences I would like to stay with the caret package.但是，出于个人喜好，我想保留caret package。 Thanks for the help!谢谢您的帮助！

Answer 1

If we go by the wiki definition of brier score:如果我们 go 按 wiki 定义的 brier 得分：

The most common formulation of the Brier score is Brier 评分最常见的公式是

where f_t is the probability that was forecast, o_t the actual outcome of the (0 or 1) and N is the number of forecasting instances.其中 f_t 是预测的概率，o_t 是（0 或 1）的实际结果，N 是预测实例的数量。

In R, if your label is a factor, then the logistic regression will always predict with respect to the 2nd level, meaning you just calculate the probability and 0/1 with respect to that.在 R 中，如果您的 label 是一个因素，那么逻辑回归将始终针对第二级进行预测，这意味着您只需计算概率和 0/1。 For example:例如：

library(caret)
idx = sample(nrow(iris),100)
data = iris
data$Species = factor(ifelse(data$Species=="versicolor","v","o"))
levels(data$Species)
[1] "o" "v"

In this case, o is 0 and v is 1.在这种情况下，o 为 0，v 为 1。

train_data = data[idx,]
test_data = data[-idx,]

model <- train(Species ~.,data = train_data,family = "binomial",method = "glmnet")

pred <- predict(model, newdata = test_data)

So we can see the probability of the class:所以我们可以看到 class 的概率：

head(pred)
          o          v
1 0.8367885 0.16321154
2 0.7970508 0.20294924
3 0.6383656 0.36163437
4 0.9510763 0.04892370
5 0.9370721 0.06292789

To calculate the score:计算分数：

f_t = pred[,2]
o_t = as.numeric(test_data$Species)-1
mean((f_t - o_t)^2)
[1] 0.32

Answer 2

I use the Brier score to tune my models in caret for binary classification.我使用 Brier 分数来调整我的caret中的模型以进行二元分类。 I ensure that the "positive" class is the second class, which is the default when you label your response "0:1".我确保“正面”class 是第二个 class，这是您 label 您的响应“0:1”时的默认值。 Then I created this master summary function, based on caret 's own suite of summary functions, to return all the metrics I want to see:然后我创建了这个主摘要 function，基于caret自己的摘要函数套件，以返回我想查看的所有指标：

BigSummary <- function (data, lev = NULL, model = NULL) {
  pr_auc <- try(MLmetrics::PRAUC(data[, lev[2]],
                                 ifelse(data$obs == lev[2], 1, 0)),
                silent = TRUE)
  brscore <- try(mean((data[, lev[2]] - ifelse(data$obs == lev[2], 1, 0)) ^ 2),
               silent = TRUE)
  rocObject <- try(pROC::roc(ifelse(data$obs == lev[2], 1, 0), data[, lev[2]],
                             direction = "<", quiet = TRUE), silent = TRUE)
  if (inherits(pr_auc, "try-error")) pr_auc <- NA
  if (inherits(brscore, "try-error")) brscore <- NA
  rocAUC <- if (inherits(rocObject, "try-error")) {
    NA
  } else {
    rocObject$auc
  }
  tmp <- unlist(e1071::classAgreement(table(data$obs,
                                            data$pred)))[c("diag", "kappa")]
  out <- c(Acc = tmp[[1]],
           Kappa = tmp[[2]],
           AUCROC = rocAUC,
           AUCPR = pr_auc,
           Brier = brscore,
           Precision = caret:::precision.default(data = data$pred,
                                                 reference = data$obs,
                                                 relevant = lev[2]),
           Recall = caret:::recall.default(data = data$pred,
                                           reference = data$obs,
                                           relevant = lev[2]),
           F = caret:::F_meas.default(data = data$pred, reference = data$obs,
                                      relevant = lev[2]))
  out
}

Now I can simply pass summaryFunction = BigSummary in trainControl and then metric = "Brier", maximize = FALSE in the train call.现在我可以简单地在 trainControl 中传递summaryFunction = BigSummary trainControl然后在train调用中传递metric = "Brier", maximize = FALSE 。

R：插入符号 package：布赖尔分数

问题描述

2 个解决方案

解决方案1
2 已采纳 2020-04-03 15:36:40

解决方案2
0 2021-04-15 23:17:39

R：插入符号 package：布赖尔分数

问题描述

2 个解决方案

解决方案1 2 已采纳 2020-04-03 15:36:40

解决方案2 0 2021-04-15 23:17:39

解决方案1
2 已采纳 2020-04-03 15:36:40

解决方案2
0 2021-04-15 23:17:39