简体   繁体   English

如何在R中的多类分类问题中计算F1-measure和ROC?

[英]How can I calculate F1-measure and ROC in multiclass classification problem in R?

I have this code for a multiclass classification problem: 我有一个多类分类问题的代码:

data$Class = as.factor(data$Class)
levels(data$Class) <- make.names(levels(factor(data$Class)))
trainIndex <- createDataPartition(data$Class, p = 0.6, list = FALSE, times=1)
trainingSet <- data[ trainIndex,]
testingSet  <- data[-trainIndex,]
train_x <- trainingSet[, -ncol(trainingSet)]
train_y <- trainingSet$Class

testing_x <- testingSet[, -ncol(testingSet)]
testing_y <- testingSet$Class

oneRM <- OneR(trainingSet, verbose = TRUE)
oneRM
summary(oneRM)
plot(oneRM)    

oneRM_pred <- predict(oneRM, testing_x)
oneRM_pred

eval_model(oneRM_pred, testing_y)


AUC_oneRM_pred <- auc(roc(oneRM_pred,testing_y))
cat ("AUC=", oneRM_pred)

# Recall-Precision curve    
oneRM_prediction <- prediction(oneRM_pred, testing_y)
RP.perf <- performance(oneRM_prediction, "tpr", "fpr")

plot (RP.perf)

plot(roc(oneRM_pred,testing_y))

But code does not work, after this line: 但在此行之后代码不起作用:

oneRM_prediction <- prediction(oneRM_pred, testing_y) oneRM_prediction < - 预测(oneRM_pred,testing_y)

I get this error: 我收到此错误:

Error in prediction(oneRM_pred, testing_y) : Format of predictions is invalid. 预测错误(oneRM_pred,testing_y):预测格式无效。

In addition, I don´t know how I can get easily the F1-measure. 另外,我不知道如何轻松获得F1测量。

Finally, a question, does it make sense to calculate AUC in a multi-class classification problem? 最后,一个问题,在多类分类问题中计算AUC是否有意义?

Let's start from F1. 让我们从F1开始。

Assuming that you are using the iris dataset, first, we need to load everything, train the model and perform the predictions as you did. 假设您正在使用虹膜数据集,首先,我们需要加载所有内容,训练模型并执行预测。

library(datasets)
library(caret)
library(OneR)
library(pROC)

trainIndex <- createDataPartition(iris$Species, p = 0.6, list = FALSE, times=1)
trainingSet <- iris[ trainIndex,]
testingSet  <- iris[-trainIndex,]
train_x <- trainingSet[, -ncol(trainingSet)]
train_y <- trainingSet$Species

testing_x <- testingSet[, -ncol(testingSet)]
testing_y <- testingSet$Species

oneRM <- OneR(trainingSet, verbose = TRUE)
oneRM_pred <- predict(oneRM, testing_x)

Then, you should calculate the precision, recall, and F1 for each class. 然后,您应该计算每个班级的精度,召回率和F1。

cm <- as.matrix(confusionMatrix(oneRM_pred, testing_y))
n = sum(cm) # number of instances
nc = nrow(cm) # number of classes
rowsums = apply(cm, 1, sum) # number of instances per class
colsums = apply(cm, 2, sum) # number of predictions per class
diag = diag(cm)  # number of correctly classified instances per class 

precision = diag / colsums 
recall = diag / rowsums 
f1 = 2 * precision * recall / (precision + recall) 

print(" ************ Confusion Matrix ************")
print(cm)
print(" ************ Diag ************")
print(diag)
print(" ************ Precision/Recall/F1 ************")
print(data.frame(precision, recall, f1)) 

After that, you are able to find the macro F1. 之后,您可以找到宏F1。

macroPrecision = mean(precision)
macroRecall = mean(recall)
macroF1 = mean(f1)

print(" ************ Macro Precision/Recall/F1 ************")
print(data.frame(macroPrecision, macroRecall, macroF1)) 

To find the ROC (precisely the AUC), it best to use pROC library. 要找到ROC(恰好是AUC),最好使用pROC库。

print(" ************ AUC ************")
roc.multi <- multiclass.roc(testing_y, as.numeric(oneRM_pred))
print(auc(roc.multi))

Hope that it helps you. 希望它能帮到你。

Find details on this link for F1 and this for AUC. 找到这个细节链接和F1 这样的AUC。

If I use levels(oneRM_pred) <- levels(testing_y) in this way: 如果我以这种方式使用级别(oneRM_pred)< - levels(testing_y)

...
oneRM <- OneR(trainingSet, verbose = TRUE)
oneRM
summary(oneRM)
plot(oneRM)    

oneRM_pred <- predict(oneRM, testing_x)
levels(oneRM_pred) <- levels(testing_y)
...

The accuracy is very much lower than before. 准确度比以前低得多。 So, I am not sure if to enforce the same levels is a good solution. 所以,我不确定是否强制执行相同级别是一个很好的解决方案。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM