![](/img/trans.png)
[英]Python - How to create confusion matrix statistics using python pandas crosstab
[英]How to create a confusion matrix from an incomplete dataframe in python
我有一個看起來像這樣的數據框:
I1 I2 V
0 1 1 300
1 1 5 7
2 1 9 3
3 2 2 280
4 2 3 4
5 5 1 5
6 5 5 400
I1和I2代表索引,而V代表值。 值等於0的索引已被省略,但是我想得到一個顯示所有值的混淆矩陣,即像這樣的東西:
1 2 3 4 5 6 7 8 9
1 300 0 0 0 7 0 0 0 3
2 0 280 4 0 0 0 0 0 0
3 0 0 0 0 0 0 0 0 0
4 0 0 0 0 0 0 0 0 0
5 5 0 0 0 400 0 0 0 0
6 0 0 0 0 0 0 0 0 0
7 0 0 0 0 0 0 0 0 0
8 0 0 0 0 0 0 0 0 0
9 0 0 0 0 0 0 0 0 0
我該怎么做?
提前致謝!
使用set_index
與unstack
的重塑,進行追加遺漏值增加reindex
和數據清理rename_axis
:
r = range(1, 10)
df = (df.set_index(['I1','I2'])['V']
.unstack(fill_value=0)
.reindex(index=r, columns=r, fill_value=0)
.rename_axis(None)
.rename_axis(None, axis=1))
print (df)
1 2 3 4 5 6 7 8 9
1 300 0 0 0 7 0 0 0 3
2 0 280 4 0 0 0 0 0 0
3 0 0 0 0 0 0 0 0 0
4 0 0 0 0 0 0 0 0 0
5 5 0 0 0 400 0 0 0 0
6 0 0 0 0 0 0 0 0 0
7 0 0 0 0 0 0 0 0 0
8 0 0 0 0 0 0 0 0 0
9 0 0 0 0 0 0 0 0 0
詳細說明 :
print (df.set_index(['I1','I2'])['V']
.unstack(fill_value=0))
I2 1 2 3 5 9
I1
1 300 0 0 7 3
2 0 280 4 0 0
5 5 0 0 400 0
如果所有值都是整數,則使用pivot
替代解決方案:
r = range(1, 10)
df = (df.pivot('I1','I2', 'V')
.fillna(0)
.astype(int)
.reindex(index=r, columns=r, fill_value=0)
.rename_axis(None)
.rename_axis(None, axis=1))
print (df)
1 2 3 4 5 6 7 8 9
1 300 0 0 0 7 0 0 0 3
2 0 280 4 0 0 0 0 0 0
3 0 0 0 0 0 0 0 0 0
4 0 0 0 0 0 0 0 0 0
5 5 0 0 0 400 0 0 0 0
6 0 0 0 0 0 0 0 0 0
7 0 0 0 0 0 0 0 0 0
8 0 0 0 0 0 0 0 0 0
9 0 0 0 0 0 0 0 0 0
選項1:使用numpy
您可以
In [150]: size = df[['I1', 'I2']].values.max()
In [151]: arr = np.zeros((size, size))
In [152]: arr[df.I1-1, df.I2-1] = df.V
In [153]: idx = np.arange(1, size+1)
In [154]: pd.DataFrame(arr, index=idx, columns=idx).astype(int)
Out[154]:
1 2 3 4 5 6 7 8 9
1 300 0 0 0 7 0 0 0 3
2 0 280 4 0 0 0 0 0 0
3 0 0 0 0 0 0 0 0 0
4 0 0 0 0 0 0 0 0 0
5 5 0 0 0 400 0 0 0 0
6 0 0 0 0 0 0 0 0 0
7 0 0 0 0 0 0 0 0 0
8 0 0 0 0 0 0 0 0 0
9 0 0 0 0 0 0 0 0 0
選項2:使用scipy.sparse.csr_matrix
In [178]: from scipy.sparse import csr_matrix
In [179]: size = df[['I1', 'I2']].values.max()
In [180]: idx = np.arange(1, size+1)
In [181]: pd.DataFrame(csr_matrix((df['V'], (df['I1']-1, df['I2']-1)), shape=(size, si
...: ze)).toarray(), index=idx, columns=idx)
Out[181]:
1 2 3 4 5 6 7 8 9
1 300 0 0 0 7 0 0 0 3
2 0 280 4 0 0 0 0 0 0
3 0 0 0 0 0 0 0 0 0
4 0 0 0 0 0 0 0 0 0
5 5 0 0 0 400 0 0 0 0
6 0 0 0 0 0 0 0 0 0
7 0 0 0 0 0 0 0 0 0
8 0 0 0 0 0 0 0 0 0
9 0 0 0 0 0 0 0 0 0
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.