[英]How compute confusion matrix for multiclass classification in Scikit?
I have a multiclass classification task. 我有一个多类分类任务。 When I run my script based on the scikit example as the follows:
当我基于scikit示例运行我的脚本时如下:
classifier = OneVsRestClassifier(GradientBoostingClassifier(n_estimators=70, max_depth=3, learning_rate=.02))
y_pred = classifier.fit(X_train, y_train).predict(X_test)
cnf_matrix = confusion_matrix(y_test, y_pred)
I get this error: 我收到此错误:
File "C:\ProgramData\Anaconda2\lib\site-packages\sklearn\metrics\classification.py", line 242, in confusion_matrix
raise ValueError("%s is not supported" % y_type)
ValueError: multilabel-indicator is not supported
I tried to pass the labels=classifier.classes_
to confusion_matrix()
, but it doesn't help. 我试图将
labels=classifier.classes_
传递给confusion_matrix()
,但它没有帮助。
y_test and y_pred are as the follow: y_test和y_pred如下:
y_test =
array([[0, 0, 0, 1, 0, 0],
[0, 0, 0, 0, 1, 0],
[0, 1, 0, 0, 0, 0],
...,
[0, 0, 0, 0, 0, 1],
[0, 0, 0, 1, 0, 0],
[0, 0, 0, 0, 1, 0]])
y_pred =
array([[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
...,
[0, 0, 0, 0, 0, 1],
[0, 0, 0, 0, 0, 1],
[0, 0, 0, 0, 0, 0]])
First you need to create the label output array. 首先,您需要创建标签输出数组。 Lets say you have 3 classes: 'cat', 'dog', 'house' indexed: 0,1,2 .
假设你有3个类:'cat','dog','house'索引:0,1,2。 And the prediction for 2 samples is: 'dog', 'house'.
对2个样本的预测是:'dog','house'。 Your output will be:
你的输出将是:
y_pred = [[0, 1, 0],[0, 0, 1]]
run y_pred.argmax(1) to get: [1,2] This array stands for the original label indexes, meaning: ['dog', 'house'] 运行y_pred.argmax(1)得到:[1,2]这个数组代表原始标签索引,意思是:['dog','house']
num_classes = 3
# from lable to categorial
y_prediction = np.array([1,2])
y_categorial = np_utils.to_categorical(y_prediction, num_classes)
# from categorial to lable indexing
y_pred = y_categorial.argmax(1)
This worked for me: 这对我有用:
y_test_non_category = [ np.argmax(t) for t in y_test ]
y_predict_non_category = [ np.argmax(t) for t in y_predict ]
from sklearn.metrics import confusion_matrix
conf_mat = confusion_matrix(y_test_non_category, y_predict_non_category)
where y_test
and y_predict
are categorical variables like one-hot vectors. 其中
y_test
和y_predict
是分类变量,如单热矢量。
I just subtracted the outputs y_test
matrix from the prediction y_pred
matrix while keeping the categorical format. 我只是从预测
y_pred
矩阵中减去输出y_test
矩阵,同时保持分类格式。 In case of -1
, I assumed a false negative while in case of 1
, a false positive. 在
-1
情况下,我假设为假阴性,而在1
情况下,假阳性。
Next: 下一个:
if output_matrix[i,j] == 1 and predictions_matrix[i,j] == 1:
produced_matrix[i,j] = 2
Ending up with the following notation: 以下面的表示法结束:
Finally, be performing some naive counting you can produce any confusion metric. 最后,执行一些天真的计数,你可以产生任何混淆度量。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.