簡體   English   中英

如何在python中從不完整的數據幀創建混淆矩陣

[英]How to create a confusion matrix from an incomplete dataframe in python

我有一個看起來像這樣的數據框:

   I1  I2    V
0   1   1  300
1   1   5    7
2   1   9    3
3   2   2  280
4   2   3    4
5   5   1    5
6   5   5  400

I1I2代表索引,而V代表值。 值等於0的索引已被省略,但是我想得到一個顯示所有值的混淆矩陣,即像這樣的東西:

   1   2   3   4   5   6   7   8   9
1  300 0   0   0   7   0   0   0   3
2  0   280 4   0   0   0   0   0   0
3  0   0   0   0   0   0   0   0   0
4  0   0   0   0   0   0   0   0   0
5  5   0   0   0   400 0   0   0   0
6  0   0   0   0   0   0   0   0   0
7  0   0   0   0   0   0   0   0   0
8  0   0   0   0   0   0   0   0   0
9  0   0   0   0   0   0   0   0   0

我該怎么做?

提前致謝!

使用set_indexunstack的重塑,進行追加遺漏值增加reindex和數據清理rename_axis

r = range(1, 10)
df = (df.set_index(['I1','I2'])['V']
        .unstack(fill_value=0)
        .reindex(index=r, columns=r, fill_value=0)
        .rename_axis(None)
        .rename_axis(None, axis=1))
print (df)
     1    2  3  4    5  6  7  8  9
1  300    0  0  0    7  0  0  0  3
2    0  280  4  0    0  0  0  0  0
3    0    0  0  0    0  0  0  0  0
4    0    0  0  0    0  0  0  0  0
5    5    0  0  0  400  0  0  0  0
6    0    0  0  0    0  0  0  0  0
7    0    0  0  0    0  0  0  0  0
8    0    0  0  0    0  0  0  0  0
9    0    0  0  0    0  0  0  0  0

詳細說明

print (df.set_index(['I1','I2'])['V']
        .unstack(fill_value=0))
I2    1    2  3    5  9
I1                     
1   300    0  0    7  3
2     0  280  4    0  0
5     5    0  0  400  0

如果所有值都是整數,則使用pivot替代解決方案:

r = range(1, 10)
df = (df.pivot('I1','I2', 'V')
        .fillna(0)
        .astype(int)
        .reindex(index=r, columns=r, fill_value=0)
        .rename_axis(None)
        .rename_axis(None, axis=1))
print (df)
     1    2  3  4    5  6  7  8  9
1  300    0  0  0    7  0  0  0  3
2    0  280  4  0    0  0  0  0  0
3    0    0  0  0    0  0  0  0  0
4    0    0  0  0    0  0  0  0  0
5    5    0  0  0  400  0  0  0  0
6    0    0  0  0    0  0  0  0  0
7    0    0  0  0    0  0  0  0  0
8    0    0  0  0    0  0  0  0  0
9    0    0  0  0    0  0  0  0  0

選項1:使用numpy您可以

In [150]: size = df[['I1', 'I2']].values.max()

In [151]: arr = np.zeros((size, size))

In [152]: arr[df.I1-1, df.I2-1] = df.V

In [153]: idx = np.arange(1, size+1)

In [154]: pd.DataFrame(arr, index=idx, columns=idx).astype(int)
Out[154]:
     1    2  3  4    5  6  7  8  9
1  300    0  0  0    7  0  0  0  3
2    0  280  4  0    0  0  0  0  0
3    0    0  0  0    0  0  0  0  0
4    0    0  0  0    0  0  0  0  0
5    5    0  0  0  400  0  0  0  0
6    0    0  0  0    0  0  0  0  0
7    0    0  0  0    0  0  0  0  0
8    0    0  0  0    0  0  0  0  0
9    0    0  0  0    0  0  0  0  0

選項2:使用scipy.sparse.csr_matrix

In [178]: from scipy.sparse import csr_matrix

In [179]: size = df[['I1', 'I2']].values.max()

In [180]: idx = np.arange(1, size+1)

In [181]: pd.DataFrame(csr_matrix((df['V'], (df['I1']-1, df['I2']-1)), shape=(size, si
     ...: ze)).toarray(), index=idx, columns=idx)
Out[181]:
     1    2  3  4    5  6  7  8  9
1  300    0  0  0    7  0  0  0  3
2    0  280  4  0    0  0  0  0  0
3    0    0  0  0    0  0  0  0  0
4    0    0  0  0    0  0  0  0  0
5    5    0  0  0  400  0  0  0  0
6    0    0  0  0    0  0  0  0  0
7    0    0  0  0    0  0  0  0  0
8    0    0  0  0    0  0  0  0  0
9    0    0  0  0    0  0  0  0  0

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM