如何將插入符號訓練的隨機森林 model 輸入到 predict() 和 performance() 函數中？

Question

我想使用performance()創建精確召回曲線，但我不知道如何輸入我的數據。 我按照這個例子。

attach(ROCR.simple)
pred <- prediction(ROCR.simple$predictions, ROCR.simple$labels)
perf <- performance(pred,"prec","rec")
plot(perf)

我試圖模仿我的caret訓練的 RF model專門針對訓練數據（我知道有各種關於如何在newdata上使用predict的示例）。 我試過這個：

pred <- prediction(rf_train_model$pred$case, rf_train_model$pred$pred)
perf <- performance(pred,"prec","rec")
plot(perf)

我的 model 下面。 我嘗試了上述方法，因為這似乎與ROCR.simple數據相匹配。

#create model
ctrl <- trainControl(method = "cv",
                     number = 5,
                     savePredictions = TRUE,
                     summaryFunction = twoClassSummary,
                     classProbs = TRUE)
set.seed(3949)
rf_train_model <- train(outcome ~ ., data=df_train, 
                  method= "rf",
                  ntree = 1500, 
                  tuneGrid = data.frame(mtry = 33), 
                  trControl = ctrl, 
                  preProc=c("center","scale"), 
                  metric="ROC",
                  importance=TRUE)

> head(rf_train_model$pred)
     pred     obs      case   control rowIndex mtry Resample
1 control control 0.3173333 0.6826667        4   33    Fold1
2 control control 0.3666667 0.6333333        7   33    Fold1
3 control control 0.2653333 0.7346667       16   33    Fold1
4 control control 0.1606667 0.8393333       18   33    Fold1
5 control control 0.2840000 0.7160000       20   33    Fold1
6    case    case 0.6206667 0.3793333       25   33    Fold1

這是錯誤的，因為我的精確召回曲線走錯了路。 我感興趣的不僅僅是 PRAUC 曲線，雖然這是一個很好的來源，它展示了如何制作它，所以我想修復這個錯誤。 我犯了什么錯誤？

Answer 1

如果您閱讀性能的小插曲：

必須聲明哪個 class label 表示負數，哪個正數 class。 理想情況下，標簽應作為有序因子提供，下層對應於負 class，上層對應於正 class。 如果標簽是因子（無序）、數字、邏輯或字符，標簽的順序是從 R 的內置 < 關系推斷出來的（例如 0 < 1, -1 < 1, 'a' < 'b', FALSE < TRUE ）。

在您的情況下，當您提供 rf_train_model$pred$pred 時，上層仍然是“控制”，因此最好的方法是使其為 TRUE / FALSE。 您還應該提供實際的 label，而不是預測的 label, rf_train_model$obs 。 請參閱下面的示例：

library(caret)
library(ROCR)
set.seed(100)
df = data.frame(matrix(runif(100*100),ncol=100))
df$outcome = ifelse(runif(100)>0.5,"case","control")

df_train = df[1:80,]
df_test = df[81:100,]

rf_train_model <- train(outcome ~ ., data=df_train, 
                  method= "rf",
                  ntree = 1500, 
                  tuneGrid = data.frame(mtry = 33), 
                  trControl = ctrl, 
                  preProc=c("center","scale"), 
                  metric="ROC",
                  importance=TRUE)

levels(rf_train_model$pred$pred)
[1] "case"    "control"

plotCurve = function(label,positive_class,prob){
pred = prediction(prob,label==positive_class)
perf <- performance(pred,"prec","rec")
plot(perf)
}

plotCurve(rf_train_model$pred$obs,"case",rf_train_model$pred$case)
plotCurve(rf_test$outcome,"case",predict(rf_train,df_test,type="prob")[,2])

如何將插入符號訓練的隨機森林 model 輸入到 predict() 和 performance() 函數中？

問題描述

1 個解決方案

解決方案1
1 已采納 2020-06-16 15:13:27

如何將插入符號訓練的隨機森林 model 輸入到 predict() 和 performance() 函數中？

問題描述

1 個解決方案

解決方案1 1 已采納 2020-06-16 15:13:27

解決方案1
1 已采納 2020-06-16 15:13:27