简体   繁体   English

如果非对角条目中全为零,则删除行和列

[英]Removing rows and columns if all zeros in non-diagonal entries

I am generating a confusion matrix to get an idea on my text-classifier 's prediction vs ground-truth .我正在生成一个confusion matrix ,以了解我的text-classifierpredictionground-truth The purpose is to understand which intent s are being predicted as some another intent s.目的是了解哪些intent被预测为另一个intent But the problem is I have too many classes (more than 160 ), so the matrix is sparse , where most of the fields are zeros .但问题是我有太多的类(超过160个),所以矩阵是sparse ,其中大部分字段都是zeros Obviously, the diagonal elements are likely to be non-zero, as it is basically the indication of correct prediction.显然,对角线元素很可能不为零,因为它基本上是正确预测的指示。

That being the case, I want to generate a simpler version of it, as we only care non-zero elements if they are non-diagonal , hence, I want to remove the row s and column s where all the elements are zeros (ignoring the diagonal entries), such that the graph becomes much smaller and manageable to view.在这种情况下,我想生成一个更简单的版本,因为我们只关心non-zero元素,如果它们是non-diagonal ,因此,我想删除所有元素为零的rowcolumn (忽略diagonal条目),这样图形变得更小并且易于查看。 How to do that?怎么做?

Following is the code snippet that I have done so far, it will produce mapping for all the intents ie, (#intent, #intent) dimensional plot.以下是我到目前为止所做的代码片段,它将为所有意图生成映射,即(#intent, #intent)维度 plot。

import matplotlib.pyplot as plt
import numpy as np 
from pandas import DataFrame
import seaborn as sns
%matplotlib inline
sns.set(rc={'figure.figsize':(64,64)})

confusion_matrix = pd.crosstab(df['ground_truth_intent_name'], df['predicted_intent_name'])

variables = sorted(list(set(df['ground_truth_intent_name'])))
temp = DataFrame(confusion_matrix, index=variables, columns=variables)

sns.heatmap(temp, annot=True)

TL;DR TL;博士

Here temp is a pandas dataframe .这里的temppandas dataframe I need to remove all rows and columns where all elements are zeros (ignoring the diagonal elements, even if they are not zero).我需要删除所有元素为零的所有行和列(忽略对角线元素,即使它们不为零)。

You can use any on the comparison, but first you need to fill the diagonal with 0 :您可以在比较中使用any ,但首先您需要用0填充对角线:

# also consider using
# a = np.isclose(confusion_matrix.to_numpy(), 0)
a = confusion_matrix.to_numpy() != 0

# fill diagonal
np.fill_diagonal(a, False)

# columns with at least one non-zero
cols = a.any(axis=0)

# rows with at least one non-zero
rows = a.any(axis=1)

# boolean indexing
confusion_matrix.loc[rows, cols]

Let's take an example:举个例子:

# random data
np.random.seed(1)
# this would agree with the above
a = np.random.randint(0,2, (5,5))
a[2] = 0
a[:-1,-1] = 0
confusion_matrix = pd.DataFrame(a)

So the data would be:所以数据将是:

   0  1  2  3  4
0  1  1  0  0  0
1  1  1  1  1  0
2  0  0  0  0  0
3  0  0  1  0  0
4  0  1  0  0  1

and the code outputs (notice the 2nd row and 4th column are gone):和代码输出(注意第 2 行和第 4 列消失了):

   0  1  2  3
0  1  1  0  0
1  1  1  1  1
3  0  0  1  0
4  0  1  0  0

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM