![](/img/trans.png)
[英]Python - How to create confusion matrix statistics using python pandas crosstab
[英]How to create a confusion matrix from an incomplete dataframe in python
我有一个看起来像这样的数据框:
I1 I2 V
0 1 1 300
1 1 5 7
2 1 9 3
3 2 2 280
4 2 3 4
5 5 1 5
6 5 5 400
I1和I2代表索引,而V代表值。 值等于0的索引已被省略,但是我想得到一个显示所有值的混淆矩阵,即像这样的东西:
1 2 3 4 5 6 7 8 9
1 300 0 0 0 7 0 0 0 3
2 0 280 4 0 0 0 0 0 0
3 0 0 0 0 0 0 0 0 0
4 0 0 0 0 0 0 0 0 0
5 5 0 0 0 400 0 0 0 0
6 0 0 0 0 0 0 0 0 0
7 0 0 0 0 0 0 0 0 0
8 0 0 0 0 0 0 0 0 0
9 0 0 0 0 0 0 0 0 0
我该怎么做?
提前致谢!
使用set_index
与unstack
的重塑,进行追加遗漏值增加reindex
和数据清理rename_axis
:
r = range(1, 10)
df = (df.set_index(['I1','I2'])['V']
.unstack(fill_value=0)
.reindex(index=r, columns=r, fill_value=0)
.rename_axis(None)
.rename_axis(None, axis=1))
print (df)
1 2 3 4 5 6 7 8 9
1 300 0 0 0 7 0 0 0 3
2 0 280 4 0 0 0 0 0 0
3 0 0 0 0 0 0 0 0 0
4 0 0 0 0 0 0 0 0 0
5 5 0 0 0 400 0 0 0 0
6 0 0 0 0 0 0 0 0 0
7 0 0 0 0 0 0 0 0 0
8 0 0 0 0 0 0 0 0 0
9 0 0 0 0 0 0 0 0 0
详细说明 :
print (df.set_index(['I1','I2'])['V']
.unstack(fill_value=0))
I2 1 2 3 5 9
I1
1 300 0 0 7 3
2 0 280 4 0 0
5 5 0 0 400 0
如果所有值都是整数,则使用pivot
替代解决方案:
r = range(1, 10)
df = (df.pivot('I1','I2', 'V')
.fillna(0)
.astype(int)
.reindex(index=r, columns=r, fill_value=0)
.rename_axis(None)
.rename_axis(None, axis=1))
print (df)
1 2 3 4 5 6 7 8 9
1 300 0 0 0 7 0 0 0 3
2 0 280 4 0 0 0 0 0 0
3 0 0 0 0 0 0 0 0 0
4 0 0 0 0 0 0 0 0 0
5 5 0 0 0 400 0 0 0 0
6 0 0 0 0 0 0 0 0 0
7 0 0 0 0 0 0 0 0 0
8 0 0 0 0 0 0 0 0 0
9 0 0 0 0 0 0 0 0 0
选项1:使用numpy
您可以
In [150]: size = df[['I1', 'I2']].values.max()
In [151]: arr = np.zeros((size, size))
In [152]: arr[df.I1-1, df.I2-1] = df.V
In [153]: idx = np.arange(1, size+1)
In [154]: pd.DataFrame(arr, index=idx, columns=idx).astype(int)
Out[154]:
1 2 3 4 5 6 7 8 9
1 300 0 0 0 7 0 0 0 3
2 0 280 4 0 0 0 0 0 0
3 0 0 0 0 0 0 0 0 0
4 0 0 0 0 0 0 0 0 0
5 5 0 0 0 400 0 0 0 0
6 0 0 0 0 0 0 0 0 0
7 0 0 0 0 0 0 0 0 0
8 0 0 0 0 0 0 0 0 0
9 0 0 0 0 0 0 0 0 0
选项2:使用scipy.sparse.csr_matrix
In [178]: from scipy.sparse import csr_matrix
In [179]: size = df[['I1', 'I2']].values.max()
In [180]: idx = np.arange(1, size+1)
In [181]: pd.DataFrame(csr_matrix((df['V'], (df['I1']-1, df['I2']-1)), shape=(size, si
...: ze)).toarray(), index=idx, columns=idx)
Out[181]:
1 2 3 4 5 6 7 8 9
1 300 0 0 0 7 0 0 0 3
2 0 280 4 0 0 0 0 0 0
3 0 0 0 0 0 0 0 0 0
4 0 0 0 0 0 0 0 0 0
5 5 0 0 0 400 0 0 0 0
6 0 0 0 0 0 0 0 0 0
7 0 0 0 0 0 0 0 0 0
8 0 0 0 0 0 0 0 0 0
9 0 0 0 0 0 0 0 0 0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.