如何在r中為隨機森林設置插入符號中的ppv？

Question

所以我有興趣創建一個優化 PPV 的模型。 我創建了一個 RF 模型（如下），它輸出一個混淆矩陣，然后我手動計算靈敏度、特異性、ppv、npv 和 F1。 我知道現在准確度已經過優化，但我願意放棄敏感性和特異性以獲得更高的 ppv。

data_ctrl_null <- trainControl(method="cv", number = 5, classProbs = TRUE, summaryFunction=twoClassSummary, savePredictions=T, sampling=NULL)

set.seed(5368)

model_htn_df <- train(outcome ~ ., data=htn_df, ntree = 1000, tuneGrid = data.frame(mtry = 38), trControl = data_ctrl_null, method= "rf", 
                           preProc=c("center","scale"),metric="ROC", importance=TRUE)

model_htn_df$finalModel #provides confusion matrix

結果：

Call:
  randomForest(x = x, y = y, ntree = 1000, mtry = param$mtry, importance = TRUE) 
           Type of random forest: classification
                 Number of trees: 1000
  No. of variables tried at each split: 38

    OOB estimate of  error rate: 16.2%
    Confusion matrix:
      no yes class.error
 no  274  19  0.06484642
 yes  45  57  0.44117647

我的手動計算：sen = 55.9% spec = 93.5%, ppv = 75.0%, npv = 85.9%（混淆矩陣將我的 no 和 yes 轉換為結果，所以我在計算性能指標時也會轉換數字。）

那么我需要做什么才能獲得 PPV = 90%？

這是一個類似的問題，但我並沒有真正關注它。

Answer 1

我們定義了一個函數來計算 PPV 並返回帶有名稱的結果：

PPV <- function (data,lev = NULL,model = NULL) {
   value <- posPredValue(data$pred,data$obs, positive = lev[1])
   c(PPV=value)
}

假設我們有以下數據：

library(randomForest)
library(caret)
data=iris
data$Species = ifelse(data$Species == "versicolor","versi","others")
trn = sample(nrow(iris),100)

然后我們通過指定 PPV 作為度量來訓練：

mdl <- train(Species ~ ., data = data[trn,],
             method = "rf",
             metric = "PPV",
             trControl = trainControl(summaryFunction = PPV, 
                                      classProbs = TRUE))

Random Forest 

100 samples
  4 predictor
  2 classes: 'others', 'versi' 

No pre-processing
Resampling: Bootstrapped (25 reps) 
Summary of sample sizes: 100, 100, 100, 100, 100, 100, ... 
Resampling results across tuning parameters:

  mtry  PPV      
  2     0.9682811
  3     0.9681759
  4     0.9648426

PPV was used to select the optimal model using the largest value.
The final value used for the model was mtry = 2.

現在你可以看到它是在 PPV 上訓練的。 但是你不能強迫訓練達到 0.9 的 PPV。這真的取決於數據，如果你的自變量沒有預測能力，不管你訓練多少它都不會提高，對嗎？

如何在r中為隨機森林設置插入符號中的ppv？

問題描述

1 個解決方案

解決方案1
1 已采納 2020-03-26 23:01:29

如何在r中為隨機森林設置插入符號中的ppv？

問題描述

1 個解決方案

解決方案1 1 已采納 2020-03-26 23:01:29

解決方案1
1 已采納 2020-03-26 23:01:29