[英]Pandas: Transpose, groupby and summarize columns
我有一個像這樣的pandas DataFrame:
| Id | Filter 1 | Filter 2 | Filter 3 |
|----|----------|----------|----------|
| 25 | 0 | 1 | 1 |
| 25 | 1 | 0 | 1 |
| 25 | 0 | 0 | 1 |
| 30 | 1 | 0 | 1 |
| 31 | 1 | 0 | 1 |
| 31 | 0 | 1 | 0 |
| 31 | 0 | 0 | 1 |
我需要轉置此表,添加帶有過濾器名稱的“名稱”列,並匯總過濾器列值。 結果表應該是這樣的:
| Id | Name | Summ |
| 25 | Filter 1 | 1 |
| 25 | Filter 2 | 1 |
| 25 | Filter 3 | 3 |
| 30 | Filter 1 | 1 |
| 30 | Filter 2 | 0 |
| 30 | Filter 3 | 1 |
| 31 | Filter 1 | 1 |
| 31 | Filter 2 | 1 |
| 31 | Filter 3 | 2 |
我到目前為止唯一的解決方案是使用由Id列分組的應用函數,但這個方法對我的情況來說太慢了 - 數據集可以超過40列和50_000行,我怎么能用pandas本機方法做到這一點? (例如Pivot,Transpose,Groupby)
采用:
df_new=df.melt('Id',var_name='Name',value_name='Sum').groupby(['Id','Name']).Sum.sum()\
.reset_index()
print(df_new)
Id Name Sum
0 25 Filter 1 1
1 25 Filter 2 1
2 25 Filter 3 3
3 30 Filter 1 1
4 30 Filter 2 0
5 30 Filter 3 1
6 31 Filter 1 1
7 31 Filter 2 1
8 31 Filter 3 1
然后stack
groupby
df.set_index('Id').stack().groupby(level=[0,1]).sum().reset_index()
Id level_1 0
0 25 Filter 1 1
1 25 Filter 2 1
2 25 Filter 3 3
3 30 Filter 1 1
4 30 Filter 2 0
5 30 Filter 3 1
6 31 Filter 1 1
7 31 Filter 2 1
8 31 Filter 3 1
簡潔版本
df.set_index('Id').sum(level=0).stack()#df.groupby('Id').sum().stack()
使用filter
和melt
df.filter(like='Filter').groupby(df.Id).sum().T.reset_index().melt(id_vars='index')
index Id value
0 Filter 1 25 1
1 Filter 2 25 1
2 Filter 3 25 3
3 Filter 1 30 1
4 Filter 2 30 0
5 Filter 3 30 1
6 Filter 1 31 1
7 Filter 2 31 1
8 Filter 3 31 2
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.