如何使用插入符號包解釋模型輸出的准確性

Question

我正在使用插入符號包來訓練模型，並希望獲得模型的准確性。 我聽到的一種常見方法是使用confusionMatrix。 但是，當我在下面運行代碼時，經過訓練的模型為我提供了一些准確度值，這些值與confusionMatrix（）報告的略有不同。 所以我的問題是我應該使用什么精度？ 如何解釋模型直接在控制台中提供的准確性？

ModelRF_ALL_b <- train(price~.,method="rf",data=datatraining_b)
ModelRF_ALL_b

控制台報告以下內容

Random Forest 

8143 samples
   8 predictor
   2 classes: '0', '1' 

No pre-processing
Resampling: Bootstrapped (25 reps) 
Summary of sample sizes: 8143, 8143, 8143, 8143, 8143, 8143, ... 
Resampling results across tuning parameters:

  mtry  Accuracy   Kappa    
  2     0.9948108  0.9843501
  4     0.9945824  0.9836512
  7     0.9940732  0.9821099

Accuracy was used to select the optimal model using the largest value.
The final value used for the model was mtry = 2.

我也可以運行confusionMatrix（）

confusionMatrix(datatraining_b$price,
predict(ModelRF_ALL_b,datatraining_b))

給出的精度為1。

Confusion Matrix and Statistics

      Reference
Prediction    0    1
     0 6414    0
     1    0 1729

           Accuracy : 1          
             95% CI : (0.9995, 1)
No Information Rate : 0.7877     
P-Value [Acc > NIR] : < 2.2e-16  

              Kappa : 1          
 Mcnemar's Test P-Value : NA         

        Sensitivity : 1.0000     
        Specificity : 1.0000     
     Pos Pred Value : 1.0000     
     Neg Pred Value : 1.0000     
         Prevalence : 0.7877     
     Detection Rate : 0.7877     
   Detection Prevalence : 0.7877     
  Balanced Accuracy : 1.0000     

   'Positive' Class : 0

Answer 1

您可以將這些值分別解釋為帶或不帶重采樣的樣本內精度。

當您擬合模型時，包caret會執行25次重復的自舉重采樣，這可以在模型輸出中看到。 因此，精度值基於25 x 8143觀測值。 為了創建混淆矩陣，您使用的是最終模型（mtry = 2的模型）來預測訓練樣本的結果，該樣本的長度為8143。因此，在相應樣本中略有差異是正常的准確性。

在評估擬合優度時，您需要謹慎，因為您正在使用同一數據集訓練和評估模型。 毫不奇怪，您可以獲得很高的准確性。 最好使用看不見的數據集評估最終模型，以確保其性能並發現可能的過度擬合問題。

如何使用插入符號包解釋模型輸出的准確性

問題描述

1 個解決方案

解決方案1
0 已采納 2018-08-04 22:30:44

如何使用插入符號包解釋模型輸出的准確性

問題描述

1 個解決方案

解決方案1 0 已采納 2018-08-04 22:30:44

解決方案1
0 已采納 2018-08-04 22:30:44