简体   繁体   English

如何使用函数使用Caret包从主列表中的嵌套子集中生成混淆矩阵

[英]How to use a function to produce confusion matrices using the Caret package from nested subsets in a master-list

I want to incorporate the function confusionMatrix() in the caret package into the function shuffle100 to produce confusion matrices from subsets (dataframes) of a master-list produced from classification tree models. 我想将caret package的函数confusionMatrix()合并到函数shuffle100以从分类树模型生成的主列表的子集(数据帧)中生成混淆矩阵。 My aim is to produce confusion matrix statistics such as classification accuracy, kappa metric etc (desired output below). 我的目标是产生混淆矩阵统计信息,例如分类准确性,kappa度量等(下面的期望输出)。 I am sorry to ask such a simple question but I cannot figure this out. 我很抱歉提出这样一个简单的问题,但我无法弄清楚。 If anyone can help, then many thanks in advance. 如果有人可以提供帮助,则在此先多谢。

Reproducible dummy data can be found at this address: 可在以下地址找到可复制的伪数据:

Reproducible data 可复制的数据

Code to produce a nested list of classification tree model predictions and confusion matrices 产生分类树模型预测和混淆矩阵嵌套列表的代码

        library(caret)
        library(e1071)
        library(rpart)

        set.seed(1235)

       shuffle100 <-lapply(seq(10), function(n){ #produce 10 different shuffled data-frames
       subset <- my_data[sample(nrow(my_data), 80),] #shuffle 80 rows in the data-frame
       subset_idx <- sample(1:nrow(subset), replace = FALSE)
       subset <- subset[subset_idx, ] 
       subset_resampled_idx <- createDataPartition(subset_idx, times = 1, p = 0.7, list = FALSE) #partition data-frame into 70 % training and 30 % test subsets   
       subset_resampled <- subset[subset_resampled_idx, ] #70 % training data
       ct_mod<-rpart(Family~., data=subset_resampled, method="class", control=rpart.control(cp=0.005)) #10 ct models
       ct_pred<-predict(ct_mod, newdata=subset[,2:13])
       confusionMatrix(ct_pred, norm$Family)#10 confusion matrices
       })

Error messages 错误讯息

        Error in sort.list(y) : 'x' must be atomic for 'sort.list'
        Have you called 'sort' on a list?
        Called from: sort.list(y)

Desired outcome 期望的结果

                    Confusion Matrix and Statistics

                    Reference
         Prediction G8 V4
                 G8 42 12
                 V4  8 18

                Accuracy : 0.75            
                  95% CI : (0.6406, 0.8401)
     No Information Rate : 0.625           
     P-Value [Acc > NIR] : 0.01244         

                   Kappa : 0.4521          
  Mcnemar's Test P-Value : 0.50233         

             Sensitivity : 0.8400          
             Specificity : 0.6000          
          Pos Pred Value : 0.7778          
          Neg Pred Value : 0.6923          
              Prevalence : 0.6250          
          Detection Rate : 0.5250          
    Detection Prevalence : 0.6750          
       Balanced Accuracy : 0.7200          

        'Positive' Class : G8              

Here is a function to produce confusion matrices from sub-lists (dataframes) in a master-list produced from classification tree models using the function confusionMatrix in the caret package . 这是一个使用caret package的功能confusionMatrix从分类树模型生成的主列表中的子列表(数据帧)中生成混淆矩阵的功能。

   #Generate three new column headings: 
   #(1) `Predicted'
   #(2) `Actual'
   #(3) `Binary'

 my_list <- lapply(shuffle100, function(df){#Create two new columns     Predicted and Actual
         if (nrow(df) > 0)
         cbind(df, Predicted = c(""), Actual = c(""), Binary = c(""),  Actual2 = c(""))
         else
         cbind(df, Predicted = factor(), Actual = c(""), Binary = c (""), Actual2 = c(""))
         })

  # Produce three columns filled with NA's
  #`Predicted' = NA
  #`Actual' = NA
  #`Binary' = NA

 Final_lists<-lapply(my_list, function(x) mutate(x, Predicted = NA, Actual = NA, Binary = NA, Actual2 = NA))

  #FILL THE PREDICTED COLUMN

  #Fill the `Predicted'depending on the condition of which group in the dependent variable has the highest probability: either V4 > G8 or G8 > V4

  #Fill the Predicted column

   for(i in 1:length(Final_lists)){
    for(j in 1:nrow(Final_lists[[i]])){
    Final_lists[[i]] [j,3]=names(Final_lists[[i]])[(Final_lists[[i]] [j,2] > Final_lists[[i]] [j,1])+1]
    }
   }           

 Final_lists

 #FILL THE ACTUAL COLUMN

 #Fill in the Actual column with the actual class predictions
 #Firstly create a vector for normalised_scores$Family
 #Insert normalised_scores$Family into the column called `Actual' for each sub-list in the nested sublist

  Actual <-lapply(Final_lists, `[`, 4) # Select the Actual column in all lists
  normalised_Actual<-normalised_scores$Family
  Actual<-normalised_Actual

  #There are two ways:

  #Way 1:

  # Use indices - and pass in Final_lists

   Actual_list <- lapply(seq_along(Final_lists), 
                  function(i, x){
                    x[[i]]$Actual <- Actual 
                    return (x[[i]])
                  }, Final_lists
                 )

  #FILL THE BINARY COLUMN

  # Use indices - and pass in Final_lists

  # iterate the ten elements of the outer list
  # iterate each row of EACH inner list
  # in each row, if Predicted==Actual, assign 1 to Binary, else 0

  #Method 1

   for( i in 1 : length(Actual_list)) {
    for( j in 1 : length(Actual_list[[i]]$Predicted)) {
    if(Actual_list[[i]][j,"Predicted"] == Actual_list[[i]][j,"Actual"]){
      Actual_list[[i]][j,"Binary"] <- 1
      } else {
      Actual_list[[i]][j,"Binary"] <- 0
    }
  }
}


 #Fill in Actual2 column

  for( i in 1 : length(Actual_list)){
    for( j in 1 : length(Actual_list[[i]]$Actual)){
     if(Actual_list[[i]][j,"Actual"] == "V4"){
       Actual_list[[i]][j,"Actual2"] <- 1
    } else {
      Actual_list[[i]][j,"Actual2"] <- 0
    }
   }
  }

Actual_list

#Generate confusion matrices

   confusionMatrices <- lapply(Actual_list, function(scores){
confusionMatrix(scores$Predicted, scores$Actual)
})

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何使用 Caret 包中的“gbm”方法生成混淆矩阵 - How to Produce a Confusion Matrix using the 'gbm' Method in the Caret Package 如何使用插入符号 package 获得混淆矩阵? - How to obtain confusion matrix using caret package? 应用函数从列表中的分类树类概率的嵌套列表中生成混淆矩阵 - Applying a function to generate confusion matrices from nested lists of classification tree class probabilities within a list R 如何使用 caret 包可视化混淆矩阵 - R how to visualize confusion matrix using the caret package 使用 R &#39;caret&#39; 包中的预测函数 - Using the predict function from R 'caret' package 如何使用插入符号 package 在 R 中的 plot 混淆矩阵 - How to plot confusion matrix in R with caret package R包,Caret RFE功能,如何定制使用AUC的指标? - R package, Caret RFE function, how to customize metric to use AUC? 将插入符号的混淆矩阵应用于由 data.frames 组成的列表,导致多个混淆矩阵 - Apply caret's confusionMatrix to a list made of data.frames resulting in multiple confusion matrices 如何从插入包拆分数据创建createDataPartition函数? - How does createDataPartition function from caret package split data? 如何使用同一列表中的插入符号包显示不同模型的准确性 - How to present accuracy of different models using caret package in the same list
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM