确保 pandas.crosstab 返回一个方阵

Question

I am currently using pandas.crosstab to generate the confusion matrix of my classifiers after testing.我目前正在使用 pandas.crosstab 在测试后生成我的分类器的混淆矩阵。 Unfortunately, sometimes my classifier fails, and classifies every signal as a single label (instead of multiple labels).不幸的是，有时我的分类器会失败，并将每个信号分类为单个标签（而不是多个标签）。 pandas.crosstab generates a single vector (or a non-square matrix) in that case instead of a square matrix. pandas.crosstab 在这种情况下生成单个向量（或非方阵）而不是方阵。
As example, my ground truth would be例如，我的基本事实是

true_data = pandas.Series([1, 1, 2, 2, 3, 3, 4, 4, 5, 5])

and my predicted data is我的预测数据是

pred_data = pandas.Series([3, 3, 2, 3, 2, 1, 1, 3, 4, 1])

Applying pandas.crosstab(true_data, pred_data, dropna=False) gives应用pandas.crosstab(true_data, pred_data, dropna=False)给出

col_0  1  2  3  4
row_0
1      0  0  2  0
2      0  1  1  0
3      1  1  0  0
4      1  0  1  0
5      1  0  0  1

Is there a way to get有没有办法得到

col_0  1  2  3  4  5
row_0
1      0  0  2  0  0
2      0  1  1  0  0
3      1  1  0  0  0
4      1  0  1  0  0
5      1  0  0  1  0

instead, ie leaving the matrix square and filling the missing labels with 0 ?相反，即离开矩阵正方形并用0填充缺失的标签？

Answer 1

After calculating crosstab you can reindex the dataframe along both index and columns axis.计算crosstab后，您可以沿索引和列轴reindex数据框。

i = df.index.union(df.columns)
df.reindex(index=i, columns=i, fill_value=0)

   1  2  3  4  5
1  0  0  2  0  0
2  0  1  1  0  0
3  1  1  0  0  0
4  1  0  1  0  0
5  1  0  0  1  0

Answer 2

You could create a zeros array of the required shape and then replace a portion of the array with the crosstab您可以创建所需形状的zeros数组，然后用crosstab替换数组的一部分

xtab = pd.crosstab(pred_data, true_data, dropna=False).sort_index(axis=0).sort_index(axis=1)
all_unique_values = sorted(set(true_data) | set(pred_data))
z = np.zeros((len(all_unique_values), len(all_unique_values)))
rows, cols = xtab.shape
z[:rows, :cols] = xtab
square_xtab  = pd.DataFrame(z, columns=all_unique_values, index=all_unique_values)

Output输出

     1    2    3    4    5
1  0.0  0.0  1.0  1.0  1.0
2  0.0  1.0  1.0  0.0  0.0
3  2.0  1.0  0.0  1.0  0.0
4  0.0  0.0  0.0  0.0  1.0
5  0.0  0.0  0.0  0.0  0.0

I haven't thought / tested yet if this approach will work if the mismatch is in the "middle" - as in, if, eg, pred_data = [1, 2, 4, 5] and true_data = [1, 2, 3, 4]如果不匹配位于“中间”，我还没有考虑/测试过这种方法是否有效 - 例如，如果pred_data = [1, 2, 4, 5]和true_data = [1, 2, 3, 4]

确保 pandas.crosstab 返回一个方阵

问题描述

2 个解决方案

解决方案1
2 2022-07-06 14:47:16

解决方案2
1 已采纳 2022-07-06 15:00:55

确保 pandas.crosstab 返回一个方阵

问题描述

2 个解决方案

解决方案1 2 2022-07-06 14:47:16

解决方案2 1 已采纳 2022-07-06 15:00:55

解决方案1
2 2022-07-06 14:47:16

解决方案2
1 已采纳 2022-07-06 15:00:55