在R插入符号中，获得样本内和样本外概率估计

Question

I have some data similar to: 我有一些类似的数据：

data(Titanic) # need one row per passenger

df <- data.frame(Titanic, stringsAsFactors=TRUE) 

df <- df[rep(seq_len(nrow(df)), df[,"Freq"]), which(names(df)!="Freq")]

I trained a model in caret using repeated cross-validated logistic regression, like: 我使用重复的交叉验证逻辑回归在caret训练了一个模型，例如：

library(caret) 

tc <- trainControl(method="repeatedcv", number=10, repeats=3, 
                   returnData=TRUE, savePredictions=TRUE, classProbs=TRUE)

glmFit <- train(Survived ~ Class + Sex + Age, data = df, weights=Freq, 
                method="glm", family="binomial",
                trControl = tc)

summary(glmFit)

I would like to obtain the average in-sample fitted probability and out-of-sample predicted probability (averages of 27 and of 3 values for each row in the data frame, respectively, in this case since it's 10-fold CV x 3 repeats). 我想获得平均样本内拟合概率和样本外预测概率（在这种情况下，数据帧中每行的平均值分别为27和3个值，因为它是10倍CV x 3重复）。

I would like to append each row's average in-sample and out-of-sample probability estimates onto the data frame -- to look like the last two columns of: 我想将每一行的平均样本内和样本外概率估计值附加到数据帧上-看起来像以下两列：

>df_appended
| Class  | Sex |  Age | Survived | training_p_surv_est | testing_p_surv_est |  
      3rd     M  Child          0                  .251                 .259
      3rd     M  Child          1                  .251                 .259
      2nd     M  Child          1                  .324                 .319
      2nd     M  Child          0                  .324                 .319

According to ?trainControl , I have saved the holdout predictions for each resample with savePredictions=TRUE . 根据?trainControl ，我已经使用savePredictions=TRUE保存了每次重采样的保持预测。 (And classProbs=TRUE , since I want raw probabilities, not classes.) （并且classProbs=TRUE ，因为我需要原始概率，而不是类。）

How do I access the in-sample and out-of-sample predictions? 如何访问样本内和样本外预测？ Looking at ?predict.train , I have tried using 看着?predict.train ，我尝试使用

extractProb(list(glmFit)) 
#Error in eval(expr, envir, enclos) : object 'Class2nd' not found

Many thanks. 非常感谢。

Answer 1

If you take a look at your glmFit object. 如果您看一下您的glmFit对象。 It contains a sublist named 'pred'. 它包含一个名为“ pred”的子列表。

head(glmFit$pred)

You will get the predicted probability as well as predicted class for each cv and fold. 您将获得每个简历和弃牌的预测概率以及预测类别。

cheers. 干杯。

在R插入符号中，获得样本内和样本外概率估计

问题描述

1 个解决方案

解决方案1
0 2015-06-03 18:45:37

在R插入符号中，获得样本内和样本外概率估计

问题描述

1 个解决方案

解决方案1 0 2015-06-03 18:45:37

解决方案1
0 2015-06-03 18:45:37