[英]R: Caret package: Brier Score
I want to perform a logistic regression with the train()
function from the caret
package.我想使用
caret
package 中的train()
function 执行逻辑回归。 My model looks something like that:我的 model 看起来像这样:
model <- train(Y ~.,
data = train_data,
family = "binomial",
method = "glmnet")
With the resulting model, I want to make predictions:使用生成的 model,我想做出预测:
pred <- predict(model, newdata = test_data, s = "lambda.min", type = "prob")
Now, I want to evaluate how good the model predictions are in comparison with the actual test data.现在,我想评估 model 预测与实际测试数据相比有多好。 For this I know how to receive the ROC and AUC.
为此,我知道如何接收 ROC 和 AUC。 However I am also interested in receiveing the BRIER SCORE.
不过,我也有兴趣获得 BRIER SCORE。 The formula for the Brier Score is almost identical to the MSE.
Brier 分数的公式几乎与 MSE 相同。 The problem I am facing, is that the
type
argument in predict only allows "prob" (or "class" which I am not interested in) which gives the probability of one prediction beeing a ONE (eg 0.64), and the complementing probability of beeing a ZERO (eg 0.37).我面临的问题是 predict 中的
type
参数只允许“概率”(或我不感兴趣的“类”),它给出了一个预测为 ONE 的概率(例如 0.64),以及补充概率蜜蜂为零(例如 0.37)。 For the Brier Score however, I need One probability estimate for each prediction that contains the information of both (eg a value above 0.5 would indicate a 1 and a value below 0.5 would indicate a 0).然而,对于 Brier 分数,我需要一个包含两者信息的预测的概率估计(例如,高于 0.5 的值表示 1,低于 0.5 的值表示 0)。 I have not found any solution for receiving the Brier Score in the
caret
package.我还没有找到在
caret
package 中接收 Brier 分数的任何解决方案。 I am aware that with the package cv.glmnet
the predict
function allows the argument "response" which would solve my problem.我知道使用 package
cv.glmnet
predict
function 允许参数“响应”,这将解决我的问题。 However, for personal preferences I would like to stay with the caret
package.但是,出于个人喜好,我想保留
caret
package。 Thanks for the help!谢谢您的帮助!
If we go by the wiki definition of brier score:如果我们 go 按 wiki 定义的 brier 得分:
The most common formulation of the Brier score is Brier 评分最常见的公式是
where f_t is the probability that was forecast, o_t the actual outcome of the (0 or 1) and N is the number of forecasting instances.其中 f_t 是预测的概率,o_t 是(0 或 1)的实际结果,N 是预测实例的数量。
In R, if your label is a factor, then the logistic regression will always predict with respect to the 2nd level, meaning you just calculate the probability and 0/1 with respect to that.在 R 中,如果您的 label 是一个因素,那么逻辑回归将始终针对第二级进行预测,这意味着您只需计算概率和 0/1。 For example:
例如:
library(caret)
idx = sample(nrow(iris),100)
data = iris
data$Species = factor(ifelse(data$Species=="versicolor","v","o"))
levels(data$Species)
[1] "o" "v"
In this case, o is 0 and v is 1.在这种情况下,o 为 0,v 为 1。
train_data = data[idx,]
test_data = data[-idx,]
model <- train(Species ~.,data = train_data,family = "binomial",method = "glmnet")
pred <- predict(model, newdata = test_data)
So we can see the probability of the class:所以我们可以看到 class 的概率:
head(pred)
o v
1 0.8367885 0.16321154
2 0.7970508 0.20294924
3 0.6383656 0.36163437
4 0.9510763 0.04892370
5 0.9370721 0.06292789
To calculate the score:计算分数:
f_t = pred[,2]
o_t = as.numeric(test_data$Species)-1
mean((f_t - o_t)^2)
[1] 0.32
I use the Brier score to tune my models in caret
for binary classification.我使用 Brier 分数来调整我的
caret
中的模型以进行二元分类。 I ensure that the "positive" class is the second class, which is the default when you label your response "0:1".我确保“正面”class 是第二个 class,这是您 label 您的响应“0:1”时的默认值。 Then I created this master summary function, based on
caret
's own suite of summary functions, to return all the metrics I want to see:然后我创建了这个主摘要 function,基于
caret
自己的摘要函数套件,以返回我想查看的所有指标:
BigSummary <- function (data, lev = NULL, model = NULL) {
pr_auc <- try(MLmetrics::PRAUC(data[, lev[2]],
ifelse(data$obs == lev[2], 1, 0)),
silent = TRUE)
brscore <- try(mean((data[, lev[2]] - ifelse(data$obs == lev[2], 1, 0)) ^ 2),
silent = TRUE)
rocObject <- try(pROC::roc(ifelse(data$obs == lev[2], 1, 0), data[, lev[2]],
direction = "<", quiet = TRUE), silent = TRUE)
if (inherits(pr_auc, "try-error")) pr_auc <- NA
if (inherits(brscore, "try-error")) brscore <- NA
rocAUC <- if (inherits(rocObject, "try-error")) {
NA
} else {
rocObject$auc
}
tmp <- unlist(e1071::classAgreement(table(data$obs,
data$pred)))[c("diag", "kappa")]
out <- c(Acc = tmp[[1]],
Kappa = tmp[[2]],
AUCROC = rocAUC,
AUCPR = pr_auc,
Brier = brscore,
Precision = caret:::precision.default(data = data$pred,
reference = data$obs,
relevant = lev[2]),
Recall = caret:::recall.default(data = data$pred,
reference = data$obs,
relevant = lev[2]),
F = caret:::F_meas.default(data = data$pred, reference = data$obs,
relevant = lev[2]))
out
}
Now I can simply pass summaryFunction = BigSummary
in trainControl
and then metric = "Brier", maximize = FALSE
in the train
call.现在我可以简单地在 trainControl 中传递
summaryFunction = BigSummary
trainControl
然后在train
调用中传递metric = "Brier", maximize = FALSE
。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.