[英]Calculating AUC of training dataset for glm function in R
我正在尝试使用glm在我的逻辑回归模型的训练数据上找到AUC
我将数据拆分为训练和测试集,使用glm拟合了逻辑回归模型回归模型,计算了预测值,并试图找到AUC
d<-read.csv(file.choose(), header=T)
set.seed(12345)
train = runif(nrow(d))<.5
table(train)
fit = glm(y~ ., binomial, d)
phat<-predict(fit,type = 'response')
d$phat=phat
g <- roc(y ~ phat, data = d, print.auc=T)
plot(g)
另一个用户友好的选择是使用caret
库,这使得在R中拟合和比较回归/分类模型非常简单。下面的示例代码使用GermanCredit
数据集使用逻辑回归模型来预测信用度。 该代码改编自以下博客: https : //www.r-bloggers.com/evaluating-logistic-regression-models/ 。
library(caret)
## example from https://www.r-bloggers.com/evaluating-logistic-regression-models/
data(GermanCredit)
## 60% training / 40% test data
trainIndex <- createDataPartition(GermanCredit$Class, p = 0.6, list = FALSE)
GermanCreditTrain <- GermanCredit[trainIndex, ]
GermanCreditTest <- GermanCredit[-trainIndex, ]
## logistic regression based on 10-fold cross-validation
trainControl <- trainControl(
method = "cv",
number = 10,
classProbs = TRUE,
summaryFunction = twoClassSummary
)
fit <- train(
form = Class ~ Age + ForeignWorker + Property.RealEstate + Housing.Own +
CreditHistory.Critical,
data = GermanCreditTrain,
trControl = trainControl,
method = "glm",
family = "binomial",
metric = "ROC"
)
## AUC ROC for training data
print(fit)
## AUC ROC for test data
## See https://topepo.github.io/caret/measuring-performance.html#measures-for-class-probabilities
predictTest <- data.frame(
obs = GermanCreditTest$Class, ## observed class labels
predict(fit, newdata = GermanCreditTest, type = "prob"), ## predicted class probabilities
pred = predict(fit, newdata = GermanCreditTest, type = "raw") ## predicted class labels
)
twoClassSummary(data = predictTest, lev = levels(predictTest$obs))
我喜欢用performance
中找到的命令ROCR
库。
library(ROCR)
# responsev = response variable
d.prediction<-prediction(predict(fit, type="response"), train$responsev)
d.performance<-performance(d.prediction,measure = "tpr",x.measure="fpr")
d.test.prediction<-prediction(predict(fit,newdata=d.test, type="response"), d.test$DNF)
d.test.prefermance<-performance(d.test.prediction, measure="tpr", x.measure="fpr")
# What is the actual numeric performance of our model?
performance(d.prediction,measure="auc")
performance(d.test.prediction,measure="auc")
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.