如何计算Scikit中多类分类的混淆矩阵？

Question

I have a multiclass classification task. 我有一个多类分类任务。 When I run my script based on the scikit example as the follows: 当我基于scikit示例运行我的脚本时如下：

classifier = OneVsRestClassifier(GradientBoostingClassifier(n_estimators=70, max_depth=3, learning_rate=.02))

y_pred = classifier.fit(X_train, y_train).predict(X_test)
cnf_matrix = confusion_matrix(y_test, y_pred)

I get this error: 我收到此错误：

File "C:\ProgramData\Anaconda2\lib\site-packages\sklearn\metrics\classification.py", line 242, in confusion_matrix
    raise ValueError("%s is not supported" % y_type)
ValueError: multilabel-indicator is not supported

I tried to pass the labels=classifier.classes_ to confusion_matrix() , but it doesn't help. 我试图将labels=classifier.classes_传递给confusion_matrix() ，但它没有帮助。

y_test and y_pred are as the follow: y_test和y_pred如下：

y_test =
array([[0, 0, 0, 1, 0, 0],
   [0, 0, 0, 0, 1, 0],
   [0, 1, 0, 0, 0, 0],
   ..., 
   [0, 0, 0, 0, 0, 1],
   [0, 0, 0, 1, 0, 0],
   [0, 0, 0, 0, 1, 0]])


y_pred = 
array([[0, 0, 0, 0, 0, 0],
   [0, 0, 0, 0, 0, 0],
   [0, 0, 0, 0, 0, 0],
   ..., 
   [0, 0, 0, 0, 0, 1],
   [0, 0, 0, 0, 0, 1],
   [0, 0, 0, 0, 0, 0]])

Answer 1

First you need to create the label output array. 首先，您需要创建标签输出数组。 Lets say you have 3 classes: 'cat', 'dog', 'house' indexed: 0,1,2 . 假设你有3个类：'cat'，'dog'，'house'索引：0,1,2。 And the prediction for 2 samples is: 'dog', 'house'. 对2个样本的预测是：'dog'，'house'。 Your output will be: 你的输出将是：

y_pred = [[0, 1, 0],[0, 0, 1]]

run y_pred.argmax(1) to get: [1,2] This array stands for the original label indexes, meaning: ['dog', 'house'] 运行y_pred.argmax（1）得到：[1,2]这个数组代表原始标签索引，意思是：['dog'，'house']

num_classes = 3

# from lable to categorial
y_prediction = np.array([1,2]) 
y_categorial = np_utils.to_categorical(y_prediction, num_classes)

# from categorial to lable indexing
y_pred = y_categorial.argmax(1)

Answer 2

This worked for me: 这对我有用：

y_test_non_category = [ np.argmax(t) for t in y_test ]
y_predict_non_category = [ np.argmax(t) for t in y_predict ]

from sklearn.metrics import confusion_matrix
conf_mat = confusion_matrix(y_test_non_category, y_predict_non_category)

where y_test and y_predict are categorical variables like one-hot vectors. 其中y_test和y_predict是分类变量，如单热矢量。

Answer 3

I just subtracted the outputs y_test matrix from the prediction y_pred matrix while keeping the categorical format. 我只是从预测y_pred矩阵中减去输出y_test矩阵，同时保持分类格式。 In case of -1 , I assumed a false negative while in case of 1 , a false positive. 在-1情况下，我假设为假阴性，而在1情况下，假阳性。

Next: 下一个：

if output_matrix[i,j] == 1 and predictions_matrix[i,j] == 1:  
    produced_matrix[i,j] = 2

Ending up with the following notation: 以下面的表示法结束：

-1: false negative -1：假阴性
1: false positive 1：误报
0: true negative 0：真阴性
2: true positive 2：真实的积极

Finally, be performing some naive counting you can produce any confusion metric. 最后，执行一些天真的计数，你可以产生任何混淆度量。

如何计算Scikit中多类分类的混淆矩阵？

问题描述

3 个解决方案

解决方案1
7 已采纳 2017-05-07 09:38:46

解决方案2
7 2018-01-05 14:03:32

解决方案3
0 2018-05-23 11:49:38

如何计算Scikit中多类分类的混淆矩阵？

问题描述

3 个解决方案

解决方案1 7 已采纳 2017-05-07 09:38:46

解决方案2 7 2018-01-05 14:03:32

解决方案3 0 2018-05-23 11:49:38

解决方案1
7 已采纳 2017-05-07 09:38:46

解决方案2
7 2018-01-05 14:03:32

解决方案3
0 2018-05-23 11:49:38