简体   繁体   English

混淆矩阵返回单个矩阵

[英]Confusion Matrix return single matrix

I found an issue with scikit confusion matrix. 我发现scikit混淆矩阵存在问题。

I use confusion matrix inside KFold, then when the y_true and y_pred is 100% correct, the confusion matrix return a single number. 我在KFold中使用混淆矩阵,然后当y_true和y_pred为100%正确时,混淆矩阵返回单个数字。 This make my confusion matrix variable broke, because i add the result from confusion matrix in each fold. 这使我的混淆矩阵变量崩溃了,因为我将混淆矩阵中的结果相加。 Any one have solution for this? 有人对此有解决方案吗?

Here is my code 这是我的代码

model = MultinomialNB()
kf = KFold(n_splits=10)
cf = np.array([[0, 0], [0, 0]])
for train_index, test_index in kf.split(x):
    x_train, x_test = x[train_index], x[test_index]
    y_train, y_test = y[train_index], y[test_index]
    model.fit(x_train, y_train)
    y_pred = model.predict(x_test)
    cf += confusion_matrix(y_test, y_pred)

Thank You 谢谢

The cleanest way is probably to pass a list of all possible classes in as the labels argument. 最干净的方法可能是将所有可能的类的列表作为labels参数传递。 Here is an example that shows the issue and it being resolved (based on spoofed data for the truth and predictions). 这是一个显示问题和解决方案的示例(基于真实性和预测的欺骗数据)。

from sklearn.metrics import confusion_matrix                                      
import numpy as np                                                                

y_test = np.array([1,1,1,1,1,0,0])                                                
y_pred = np.array([0,1,1,1,1,0,0])                                                

labels = np.unique(y_test)                                                        

cf = np.array([[0, 0], [0, 0]])                                                   

for indices in [ [0,1,2,3], [1,2,3] , [1,2,3,4,5,6]]:                             
    cm1= confusion_matrix(y_test[indices], y_pred[indices])                       
    cm2= confusion_matrix(y_test[indices], y_pred[indices], labels=labels)        
    print (cm1.shape == (2,2), cm2.shape == (2,2))                                

In the first subset, both classes appear; 在第一个子集中,两个类都出现。 but in the second subset, only one class appears and so the cm1 matrix is not of size (2,2) (it comes out as (1,1)). 但是在第二个子集中,仅出现一个类,因此cm1矩阵的大小不为(2,2)(结果为(1,1))。 But note that by indicating all potential classes in labels , cm2 is always ok. 但是请注意,通过在labels指示所有可能的类别,cm2始终可以。

If you already know that the labels can only be 0 or 1, you could just assign labels=[0,1], but using np.unique will be more robust. 如果您已经知道标签只能是0或1,则只需分配标签= [0,1],但是使用np.unique会更可靠。

You can check first if all pred_values are all equal to true_values . 您可以首先检查所有pred_values是否都等于true_values If it is the case, then just increment your 00 and 11 confusion matrix values by the true_values (or pred_values ). 如果是这种情况,则只需将true_values值(或pred_values )增加0011混淆矩阵值pred_values

X = pd.DataFrame({'f1': [1]*10 + [0]*10,
                  'f2': [3]*10 + [10]*10}).values
y = np.array([1]*10 + [0]*10)
model = MultinomialNB()
kf = KFold(n_splits=5)
cf = np.array([[0, 0], [0, 0]])
for train_index, test_index in kf.split(X):
    x_train, x_test = X[train_index], X[test_index]
    y_train, y_test = y[train_index], y[test_index]
    model.fit(x_train, y_train)
    y_pred = model.predict(x_test)
    if all(y_test == y_pred): # if perfect prediction
        cf[0][0] += sum(y_pred == 0) # increment by number of 0 values
        cf[1][1] += sum(y_pred == 1) # increment by number of 1 values
    else:
        cf += confusion_matrix(y_test, y_pred) # else add cf values

Result of print(cf) print(cf)结果print(cf)

>> [10  0]
   [0  10]

Be careful to overfitting 小心过度拟合

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM