[英]Caret - different results using train(), predict() and resamples()
[英]Different model accuracy when rerunning preProcess(), predict() and train() in R (caret)
下面的數據只是一個例子,它是對這個或任何數據的操作,我對此感到困惑:
library(caret)
set.seed(3433)
data(AlzheimerDisease)
complete <- data.frame(diagnosis, predictors)
in_train <- createDataPartition(complete$diagnosis, p = 0.75)[[1]]
training <- complete[in_train,]
testing <- complete[-in_train,]
predIL <- grep("^IL", names(training))
smalltrain <- training[, c(1, predIL)]
fit_noPCA <- train(diagnosis ~ ., method = "glm", data = smalltrain)
pre_proc_obj <- preProcess(smalltrain[,-1], method = "pca", thresh = 0.8)
smalltrainsPCs <- predict(pre_proc_obj, smalltrain[,-1])
fit_PCA <- train(x = smalltrainsPCs, y = smalltrain$diagnosis, method = "glm")
fit_noPCA$results$Accuracy
fit_PCA$results$Accuracy
運行此代碼時,我得到 fit_noPCA 的fit_noPCA
精度和 fit_PCA 的fit_PCA
精度。 但是當我重新運行代碼的最后一部分時:
fit_noPCA <- train(diagnosis ~ ., method = "glm", data = smalltrain)
pre_proc_obj <- preProcess(smalltrain[,-1], method = "pca", thresh = 0.8)
smalltrainsPCs <- predict(pre_proc_obj, smalltrain[,-1])
fit_PCA <- train(x = smalltrainsPCs, y = smalltrain$diagnosis, method = "glm")
fit_noPCA$results$Accuracy
fit_PCA$results$Accuracy
然后每次我重新運行這 6 行時,我都會得到不同的准確度值。 為什么會這樣? 是因為我沒有重置種子嗎? 即使,這個過程的內在隨機性在哪里?
默認情況下,model 使用引導程序進行訓練,您可以在此處查看:
library(caret)
library(AppliedPredictiveModeling)
> fit_noPCA
Generalized Linear Model
251 samples
12 predictor
2 classes: 'Impaired', 'Control'
No pre-processing
Resampling: Bootstrapped (25 reps)
Summary of sample sizes: 251, 251, 251, 251, 251, 251, ...
Resampling results:
Accuracy Kappa
0.6870006 0.04107016
因此,對於每個train
,引導的樣本都會有所不同,要獲得相同的結果,您可以在運行 train 之前設置種子:
set.seed(111)
fit_PCA <- train(x = smalltrainsPCs, y = smalltrain$diagnosis, method = "glm",trControl=trainControl(method="boot",number=100))
fit_PCA$results$Accuracy
[1] 0.6983512
set.seed(112)
fit_PCA <- train(x = smalltrainsPCs, y = smalltrain$diagnosis, method = "glm",trControl=trainControl(method="boot",number=100))
fit_PCA$results$Accuracy
[1] 0.6991537
set.seed(111)
fit_PCA <- train(x = smalltrainsPCs, y = smalltrain$diagnosis, method = "glm",trControl=trainControl(method="boot",number=100))
fit_PCA$results$Accuracy
[1] 0.6983512
或者使用例如 cv 你可以在trainControl
中使用index=
定義折疊
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.