[英]How to use a function to produce confusion matrices using the Caret package from nested subsets in a master-list
I want to incorporate the function confusionMatrix()
in the caret package
into the function shuffle100
to produce confusion matrices from subsets (dataframes) of a master-list produced from classification tree models. 我想将
caret package
的函数confusionMatrix()
合并到函数shuffle100
以从分类树模型生成的主列表的子集(数据帧)中生成混淆矩阵。 My aim is to produce confusion matrix statistics such as classification accuracy, kappa metric etc (desired output below). 我的目标是产生混淆矩阵统计信息,例如分类准确性,kappa度量等(下面的期望输出)。 I am sorry to ask such a simple question but I cannot figure this out.
我很抱歉提出这样一个简单的问题,但我无法弄清楚。 If anyone can help, then many thanks in advance.
如果有人可以提供帮助,则在此先多谢。
Reproducible dummy data can be found at this address: 可在以下地址找到可复制的伪数据:
library(caret)
library(e1071)
library(rpart)
set.seed(1235)
shuffle100 <-lapply(seq(10), function(n){ #produce 10 different shuffled data-frames
subset <- my_data[sample(nrow(my_data), 80),] #shuffle 80 rows in the data-frame
subset_idx <- sample(1:nrow(subset), replace = FALSE)
subset <- subset[subset_idx, ]
subset_resampled_idx <- createDataPartition(subset_idx, times = 1, p = 0.7, list = FALSE) #partition data-frame into 70 % training and 30 % test subsets
subset_resampled <- subset[subset_resampled_idx, ] #70 % training data
ct_mod<-rpart(Family~., data=subset_resampled, method="class", control=rpart.control(cp=0.005)) #10 ct models
ct_pred<-predict(ct_mod, newdata=subset[,2:13])
confusionMatrix(ct_pred, norm$Family)#10 confusion matrices
})
Error in sort.list(y) : 'x' must be atomic for 'sort.list'
Have you called 'sort' on a list?
Called from: sort.list(y)
Confusion Matrix and Statistics
Reference
Prediction G8 V4
G8 42 12
V4 8 18
Accuracy : 0.75
95% CI : (0.6406, 0.8401)
No Information Rate : 0.625
P-Value [Acc > NIR] : 0.01244
Kappa : 0.4521
Mcnemar's Test P-Value : 0.50233
Sensitivity : 0.8400
Specificity : 0.6000
Pos Pred Value : 0.7778
Neg Pred Value : 0.6923
Prevalence : 0.6250
Detection Rate : 0.5250
Detection Prevalence : 0.6750
Balanced Accuracy : 0.7200
'Positive' Class : G8
Here is a function to produce confusion matrices from sub-lists (dataframes) in a master-list produced from classification tree models using the function confusionMatrix
in the caret package
. 这是一个使用
caret package
的功能confusionMatrix
从分类树模型生成的主列表中的子列表(数据帧)中生成混淆矩阵的功能。
#Generate three new column headings:
#(1) `Predicted'
#(2) `Actual'
#(3) `Binary'
my_list <- lapply(shuffle100, function(df){#Create two new columns Predicted and Actual
if (nrow(df) > 0)
cbind(df, Predicted = c(""), Actual = c(""), Binary = c(""), Actual2 = c(""))
else
cbind(df, Predicted = factor(), Actual = c(""), Binary = c (""), Actual2 = c(""))
})
# Produce three columns filled with NA's
#`Predicted' = NA
#`Actual' = NA
#`Binary' = NA
Final_lists<-lapply(my_list, function(x) mutate(x, Predicted = NA, Actual = NA, Binary = NA, Actual2 = NA))
#FILL THE PREDICTED COLUMN
#Fill the `Predicted'depending on the condition of which group in the dependent variable has the highest probability: either V4 > G8 or G8 > V4
#Fill the Predicted column
for(i in 1:length(Final_lists)){
for(j in 1:nrow(Final_lists[[i]])){
Final_lists[[i]] [j,3]=names(Final_lists[[i]])[(Final_lists[[i]] [j,2] > Final_lists[[i]] [j,1])+1]
}
}
Final_lists
#FILL THE ACTUAL COLUMN
#Fill in the Actual column with the actual class predictions
#Firstly create a vector for normalised_scores$Family
#Insert normalised_scores$Family into the column called `Actual' for each sub-list in the nested sublist
Actual <-lapply(Final_lists, `[`, 4) # Select the Actual column in all lists
normalised_Actual<-normalised_scores$Family
Actual<-normalised_Actual
#There are two ways:
#Way 1:
# Use indices - and pass in Final_lists
Actual_list <- lapply(seq_along(Final_lists),
function(i, x){
x[[i]]$Actual <- Actual
return (x[[i]])
}, Final_lists
)
#FILL THE BINARY COLUMN
# Use indices - and pass in Final_lists
# iterate the ten elements of the outer list
# iterate each row of EACH inner list
# in each row, if Predicted==Actual, assign 1 to Binary, else 0
#Method 1
for( i in 1 : length(Actual_list)) {
for( j in 1 : length(Actual_list[[i]]$Predicted)) {
if(Actual_list[[i]][j,"Predicted"] == Actual_list[[i]][j,"Actual"]){
Actual_list[[i]][j,"Binary"] <- 1
} else {
Actual_list[[i]][j,"Binary"] <- 0
}
}
}
#Fill in Actual2 column
for( i in 1 : length(Actual_list)){
for( j in 1 : length(Actual_list[[i]]$Actual)){
if(Actual_list[[i]][j,"Actual"] == "V4"){
Actual_list[[i]][j,"Actual2"] <- 1
} else {
Actual_list[[i]][j,"Actual2"] <- 0
}
}
}
Actual_list
#Generate confusion matrices
confusionMatrices <- lapply(Actual_list, function(scores){
confusionMatrix(scores$Predicted, scores$Actual)
})
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.