[英]Ensure that pandas.crosstab returns a square matrix
I am currently using pandas.crosstab to generate the confusion matrix of my classifiers after testing.我目前正在使用 pandas.crosstab 在测试后生成我的分类器的混淆矩阵。 Unfortunately, sometimes my classifier fails, and classifies every signal as a single label (instead of multiple labels).不幸的是,有时我的分类器会失败,并将每个信号分类为单个标签(而不是多个标签)。 pandas.crosstab generates a single vector (or a non-square matrix) in that case instead of a square matrix. pandas.crosstab 在这种情况下生成单个向量(或非方阵)而不是方阵。
As example, my ground truth would be例如,我的基本事实是
true_data = pandas.Series([1, 1, 2, 2, 3, 3, 4, 4, 5, 5])
and my predicted data is我的预测数据是
pred_data = pandas.Series([3, 3, 2, 3, 2, 1, 1, 3, 4, 1])
Applying pandas.crosstab(true_data, pred_data, dropna=False)
gives应用pandas.crosstab(true_data, pred_data, dropna=False)
给出
col_0 1 2 3 4
row_0
1 0 0 2 0
2 0 1 1 0
3 1 1 0 0
4 1 0 1 0
5 1 0 0 1
Is there a way to get有没有办法得到
col_0 1 2 3 4 5
row_0
1 0 0 2 0 0
2 0 1 1 0 0
3 1 1 0 0 0
4 1 0 1 0 0
5 1 0 0 1 0
instead, ie leaving the matrix square and filling the missing labels with 0
?相反,即离开矩阵正方形并用0
填充缺失的标签?
After calculating crosstab
you can reindex
the dataframe along both index and columns axis.计算crosstab
后,您可以沿索引和列轴reindex
数据框。
i = df.index.union(df.columns)
df.reindex(index=i, columns=i, fill_value=0)
1 2 3 4 5
1 0 0 2 0 0
2 0 1 1 0 0
3 1 1 0 0 0
4 1 0 1 0 0
5 1 0 0 1 0
You could create a zeros
array of the required shape and then replace a portion of the array with the crosstab
您可以创建所需形状的zeros
数组,然后用crosstab
替换数组的一部分
xtab = pd.crosstab(pred_data, true_data, dropna=False).sort_index(axis=0).sort_index(axis=1)
all_unique_values = sorted(set(true_data) | set(pred_data))
z = np.zeros((len(all_unique_values), len(all_unique_values)))
rows, cols = xtab.shape
z[:rows, :cols] = xtab
square_xtab = pd.DataFrame(z, columns=all_unique_values, index=all_unique_values)
Output输出
1 2 3 4 5
1 0.0 0.0 1.0 1.0 1.0
2 0.0 1.0 1.0 0.0 0.0
3 2.0 1.0 0.0 1.0 0.0
4 0.0 0.0 0.0 0.0 1.0
5 0.0 0.0 0.0 0.0 0.0
I haven't thought / tested yet if this approach will work if the mismatch is in the "middle" - as in, if, eg, pred_data = [1, 2, 4, 5]
and true_data = [1, 2, 3, 4]
如果不匹配位于“中间”,我还没有考虑/测试过这种方法是否有效 - 例如,如果pred_data = [1, 2, 4, 5]
和true_data = [1, 2, 3, 4]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.