简体   繁体   English

R:插入符号 package:布赖尔分数

[英]R: Caret package: Brier Score

I want to perform a logistic regression with the train() function from the caret package.我想使用caret package 中的train() function 执行逻辑回归。 My model looks something like that:我的 model 看起来像这样:

model <- train(Y ~.,
  data = train_data,
  family = "binomial",
  method = "glmnet")

With the resulting model, I want to make predictions:使用生成的 model,我想做出预测:

pred <- predict(model, newdata = test_data, s = "lambda.min", type = "prob")

Now, I want to evaluate how good the model predictions are in comparison with the actual test data.现在,我想评估 model 预测与实际测试数据相比有多好。 For this I know how to receive the ROC and AUC.为此,我知道如何接收 ROC 和 AUC。 However I am also interested in receiveing the BRIER SCORE.不过,我也有兴趣获得 BRIER SCORE。 The formula for the Brier Score is almost identical to the MSE. Brier 分数的公式几乎与 MSE 相同。 The problem I am facing, is that the type argument in predict only allows "prob" (or "class" which I am not interested in) which gives the probability of one prediction beeing a ONE (eg 0.64), and the complementing probability of beeing a ZERO (eg 0.37).我面临的问题是 predict 中的type参数只允许“概率”(或我不感兴趣的“类”),它给出了一个预测为 ONE 的概率(例如 0.64),以及补充概率蜜蜂为零(例如 0.37)。 For the Brier Score however, I need One probability estimate for each prediction that contains the information of both (eg a value above 0.5 would indicate a 1 and a value below 0.5 would indicate a 0).然而,对于 Brier 分数,我需要一个包含两者信息的预测的概率估计(例如,高于 0.5 的值表示 1,低于 0.5 的值表示 0)。 I have not found any solution for receiving the Brier Score in the caret package.我还没有找到在caret package 中接收 Brier 分数的任何解决方案。 I am aware that with the package cv.glmnet the predict function allows the argument "response" which would solve my problem.我知道使用 package cv.glmnet predict function 允许参数“响应”,这将解决我的问题。 However, for personal preferences I would like to stay with the caret package.但是,出于个人喜好,我想保留caret package。 Thanks for the help!谢谢您的帮助!

If we go by the wiki definition of brier score:如果我们 go 按 wiki 定义的 brier 得分:

The most common formulation of the Brier score is Brier 评分最常见的公式是

论坛

where f_t is the probability that was forecast, o_t the actual outcome of the (0 or 1) and N is the number of forecasting instances.其中 f_t 是预测的概率,o_t 是(0 或 1)的实际结果,N 是预测实例的数量。

In R, if your label is a factor, then the logistic regression will always predict with respect to the 2nd level, meaning you just calculate the probability and 0/1 with respect to that.在 R 中,如果您的 label 是一个因素,那么逻辑回归将始终针对第二级进行预测,这意味着您只需计算概率和 0/1。 For example:例如:

library(caret)
idx = sample(nrow(iris),100)
data = iris
data$Species = factor(ifelse(data$Species=="versicolor","v","o"))
levels(data$Species)
[1] "o" "v"

In this case, o is 0 and v is 1.在这种情况下,o 为 0,v 为 1。

train_data = data[idx,]
test_data = data[-idx,]

model <- train(Species ~.,data = train_data,family = "binomial",method = "glmnet")

pred <- predict(model, newdata = test_data)

So we can see the probability of the class:所以我们可以看到 class 的概率:

head(pred)
          o          v
1 0.8367885 0.16321154
2 0.7970508 0.20294924
3 0.6383656 0.36163437
4 0.9510763 0.04892370
5 0.9370721 0.06292789

To calculate the score:计算分数:

f_t = pred[,2]
o_t = as.numeric(test_data$Species)-1
mean((f_t - o_t)^2)
[1] 0.32

I use the Brier score to tune my models in caret for binary classification.我使用 Brier 分数来调整我的caret中的模型以进行二元分类。 I ensure that the "positive" class is the second class, which is the default when you label your response "0:1".我确保“正面”class 是第二个 class,这是您 label 您的响应“0:1”时的默认值。 Then I created this master summary function, based on caret 's own suite of summary functions, to return all the metrics I want to see:然后我创建了这个主摘要 function,基于caret自己的摘要函数套件,以返回我想查看的所有指标:

BigSummary <- function (data, lev = NULL, model = NULL) {
  pr_auc <- try(MLmetrics::PRAUC(data[, lev[2]],
                                 ifelse(data$obs == lev[2], 1, 0)),
                silent = TRUE)
  brscore <- try(mean((data[, lev[2]] - ifelse(data$obs == lev[2], 1, 0)) ^ 2),
               silent = TRUE)
  rocObject <- try(pROC::roc(ifelse(data$obs == lev[2], 1, 0), data[, lev[2]],
                             direction = "<", quiet = TRUE), silent = TRUE)
  if (inherits(pr_auc, "try-error")) pr_auc <- NA
  if (inherits(brscore, "try-error")) brscore <- NA
  rocAUC <- if (inherits(rocObject, "try-error")) {
    NA
  } else {
    rocObject$auc
  }
  tmp <- unlist(e1071::classAgreement(table(data$obs,
                                            data$pred)))[c("diag", "kappa")]
  out <- c(Acc = tmp[[1]],
           Kappa = tmp[[2]],
           AUCROC = rocAUC,
           AUCPR = pr_auc,
           Brier = brscore,
           Precision = caret:::precision.default(data = data$pred,
                                                 reference = data$obs,
                                                 relevant = lev[2]),
           Recall = caret:::recall.default(data = data$pred,
                                           reference = data$obs,
                                           relevant = lev[2]),
           F = caret:::F_meas.default(data = data$pred, reference = data$obs,
                                      relevant = lev[2]))
  out
}

Now I can simply pass summaryFunction = BigSummary in trainControl and then metric = "Brier", maximize = FALSE in the train call.现在我可以简单地在 trainControl 中传递summaryFunction = BigSummary trainControl然后在train调用中传递metric = "Brier", maximize = FALSE

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM