简体   繁体   English

混乱矩阵中的白线?

[英]White lines in confusion matrix?

I have a pretty general question about numpy matrices : I've tried to normalized the results depending on the lines but I've getting some weird white lines. 关于numpy矩阵我有一个非常普遍的问题:我试图根据线条对结果进行归一化,但是我得到了一些奇怪的白线。 Is this because of some zeros stuck somewhere in division? 这是因为某些零被困在分区的某个地方吗?

Here is the code : 这是代码:

import numpy as np
from matplotlib.pylab import *

def confusion_matrix(results,tagset):
    # results : list of tuples (predicted, true)
    # tagset  : list of tags
    np.seterr(divide='ignore', invalid='ignore')
    mat     = np.zeros((len(tagset),len(tagset)))
    percent = [0,0]
    for guessed,real in results :
        mat[tagset.index(guessed),tagset.index(real)] +=1
        if guessed == real :
            percent[0] += 1
            percent[1] += 1
        else :
            percent[1] += 1
    mat /=  mat.sum(axis=1)[:,np.newaxis]
    matshow(mat,fignum=100)
    xticks(arange(len(tagset)),tagset,rotation =90,size='x-small')
    yticks(arange(len(tagset)),tagset,size='x-small')
    colorbar()
    show()
    #print "\n".join(["\t".join([""]+tagset)]+["\t".join([tagset[i]]+[str(x) for x in 
                (mat[i,:])]) for i in xrange(mat.shape[1])])
    return (percent[0] / float(percent[1]))*100

Thanks for your time ! 谢谢你的时间 ! (I hope the answer is not too obvious) (我希望答案不是太明显)

In a nutshell, you have some tags where that particular tag was never guessed. 简而言之,您有一些标签,其中特定标签从未被猜到。 Because you're normalizing by the number of times the tag was guessed, you have a row of 0/0 which yields np.nan . 因为您通过猜测标记的次数进行标准化,所以您有一行0/0 ,它产生np.nan By default, matplotlib's colorbars will set NaN 's to have no fill color, causing the background of the axes to show through (by default, white). 默认情况下,matplotlib的颜色条将NaN设置为没有填充颜色,导致轴的背景显示(默认情况下为白色)。

Here's a quick example to reproduce your current problem: 以下是重现当前问题的快速示例:

import numpy as np
import matplotlib.pyplot as plt

def main():
    tags = ['A', 'B', 'C', 'D']
    results = [('A', 'A'), ('B', 'B'), ('C', 'C'), ('A', 'D'), ('C', 'A'),
               ('B', 'B'), ('C', 'B')]
    matrix = confusion_matrix(results, tags)
    plot(matrix, tags)
    plt.show()

def confusion_matrix(results, tagset):
    output = np.zeros((len(tagset), len(tagset)), dtype=float)
    for guessed, real in results:
        output[tagset.index(guessed), tagset.index(real)] += 1
    return output / output.sum(axis=1)[:, None]

def plot(matrix, tags):
    fig, ax = plt.subplots()
    im = ax.matshow(matrix)
    cb = fig.colorbar(im)
    cb.set_label('Percentage Correct')

    ticks = range(len(tags))
    ax.set(xlabel='True Label', ylabel='Predicted Label',
           xticks=ticks, xticklabels=tags, yticks=ticks, yticklabels=tags)
    ax.xaxis.set(label_position='top')
    return fig

main()

在此输入图像描述

And if we take a look at the confusion matrix: 如果我们看一下混淆矩阵:

array([[ 0.5  ,  0.   ,  0.   ,  0.5  ],
       [ 0.   ,  1.   ,  0.   ,  0.   ],
       [ 0.333,  0.333,  0.333,  0.   ],
       [   nan,    nan,    nan,    nan]])

If you'd like to avoid the problems when a tag is never guessed, you could do something similar to: 如果你想避免在没有猜到标签时出现问题,你可以做类似的事情:

def confusion_matrix(results, tagset):
    output = np.zeros((len(tagset), len(tagset)), dtype=float)
    for guessed, real in results:
        output[tagset.index(guessed), tagset.index(real)] += 1
    num_guessed = output.sum(axis=1)[:, None]
    num_guessed[num_guessed == 0] = 1
    return output / num_guessed

Which yields (with everything else identical): 哪个收益率(其他一切都相同):

在此输入图像描述

Not directly answering your question but this is very easy to do with scikit-learn : 没有直接回答你的问题,但这很容易用scikit-learn做:

from sklearn.metrics import confusion_matrix
import matplotlib.pyplot as plt

y_test=[2, 1, 0, 2, 0, 2, 0, 1, 1, 1, 2, 1, 1, 1, 1, 0, 1, 1, 0, 0, 2, 1, 0, 0, 2, 0, 0, 1, 1, 0, 2, 1, 0, 2, 2, 1, 0, 1]
y_pred = [2, 1, 0, 2, 0, 2, 0, 1, 1, 1, 2, 1, 1, 1, 1, 0, 1, 1, 0, 0, 2, 1, 0, 0, 2, 0, 0, 1, 1, 0, 2, 1, 0, 2, 2, 1, 0, 2]

cm = confusion_matrix(y_test, y_pred)
print(cm)

# Plot confusion matrix
plt.matshow(cm)
plt.title('Confusion matrix')
plt.colorbar()    plt.ylabel('True label')
plt.xlabel('Predicted label')
plt.show()

Output: 输出:

[[13  0  0]
 [ 0 15  1]
 [ 0  0  9]]

在此输入图像描述

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM