[英]How can I calculate F1-measure and ROC in multiclass classification problem in R?
I have this code for a multiclass classification problem: 我有一个多类分类问题的代码:
data$Class = as.factor(data$Class)
levels(data$Class) <- make.names(levels(factor(data$Class)))
trainIndex <- createDataPartition(data$Class, p = 0.6, list = FALSE, times=1)
trainingSet <- data[ trainIndex,]
testingSet <- data[-trainIndex,]
train_x <- trainingSet[, -ncol(trainingSet)]
train_y <- trainingSet$Class
testing_x <- testingSet[, -ncol(testingSet)]
testing_y <- testingSet$Class
oneRM <- OneR(trainingSet, verbose = TRUE)
oneRM
summary(oneRM)
plot(oneRM)
oneRM_pred <- predict(oneRM, testing_x)
oneRM_pred
eval_model(oneRM_pred, testing_y)
AUC_oneRM_pred <- auc(roc(oneRM_pred,testing_y))
cat ("AUC=", oneRM_pred)
# Recall-Precision curve
oneRM_prediction <- prediction(oneRM_pred, testing_y)
RP.perf <- performance(oneRM_prediction, "tpr", "fpr")
plot (RP.perf)
plot(roc(oneRM_pred,testing_y))
But code does not work, after this line: 但在此行之后代码不起作用:
oneRM_prediction <- prediction(oneRM_pred, testing_y)
oneRM_prediction < - 预测(oneRM_pred,testing_y)
I get this error: 我收到此错误:
Error in prediction(oneRM_pred, testing_y) : Format of predictions is invalid.
预测错误(oneRM_pred,testing_y):预测格式无效。
In addition, I don´t know how I can get easily the F1-measure. 另外,我不知道如何轻松获得F1测量。
Finally, a question, does it make sense to calculate AUC in a multi-class classification problem? 最后,一个问题,在多类分类问题中计算AUC是否有意义?
Let's start from F1. 让我们从F1开始。
Assuming that you are using the iris dataset, first, we need to load everything, train the model and perform the predictions as you did. 假设您正在使用虹膜数据集,首先,我们需要加载所有内容,训练模型并执行预测。
library(datasets)
library(caret)
library(OneR)
library(pROC)
trainIndex <- createDataPartition(iris$Species, p = 0.6, list = FALSE, times=1)
trainingSet <- iris[ trainIndex,]
testingSet <- iris[-trainIndex,]
train_x <- trainingSet[, -ncol(trainingSet)]
train_y <- trainingSet$Species
testing_x <- testingSet[, -ncol(testingSet)]
testing_y <- testingSet$Species
oneRM <- OneR(trainingSet, verbose = TRUE)
oneRM_pred <- predict(oneRM, testing_x)
Then, you should calculate the precision, recall, and F1 for each class. 然后,您应该计算每个班级的精度,召回率和F1。
cm <- as.matrix(confusionMatrix(oneRM_pred, testing_y))
n = sum(cm) # number of instances
nc = nrow(cm) # number of classes
rowsums = apply(cm, 1, sum) # number of instances per class
colsums = apply(cm, 2, sum) # number of predictions per class
diag = diag(cm) # number of correctly classified instances per class
precision = diag / colsums
recall = diag / rowsums
f1 = 2 * precision * recall / (precision + recall)
print(" ************ Confusion Matrix ************")
print(cm)
print(" ************ Diag ************")
print(diag)
print(" ************ Precision/Recall/F1 ************")
print(data.frame(precision, recall, f1))
After that, you are able to find the macro F1. 之后,您可以找到宏F1。
macroPrecision = mean(precision)
macroRecall = mean(recall)
macroF1 = mean(f1)
print(" ************ Macro Precision/Recall/F1 ************")
print(data.frame(macroPrecision, macroRecall, macroF1))
To find the ROC (precisely the AUC), it best to use pROC
library. 要找到ROC(恰好是AUC),最好使用
pROC
库。
print(" ************ AUC ************")
roc.multi <- multiclass.roc(testing_y, as.numeric(oneRM_pred))
print(auc(roc.multi))
Hope that it helps you. 希望它能帮到你。
Find details on this link for F1 and this for AUC. 找到这个细节链接和F1 这样的AUC。
If I use levels(oneRM_pred) <- levels(testing_y) in this way: 如果我以这种方式使用级别(oneRM_pred)< - levels(testing_y) :
...
oneRM <- OneR(trainingSet, verbose = TRUE)
oneRM
summary(oneRM)
plot(oneRM)
oneRM_pred <- predict(oneRM, testing_x)
levels(oneRM_pred) <- levels(testing_y)
...
The accuracy is very much lower than before. 准确度比以前低得多。 So, I am not sure if to enforce the same levels is a good solution.
所以,我不确定是否强制执行相同级别是一个很好的解决方案。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.