![](/img/trans.png)
[英]How to apply an aggregate function to all columns of a pivot table in Pandas
[英]How to apply aggregate function with a condition on a pivot table in Pandas?
我的數據框看起來“像”這樣:
index name method values
0. A estimated 4874
1. A counted 847
2. A estimated 1152
3. B estimated 276
4. B counted 6542
5. B counted 1152
6. B estimated 3346
7. C counted 7622
8. C estimated 26
...
我想要做的是為每個“名稱”求和“估計”和“計數”值的總數。 我嘗試像在這段代碼中那樣使用 pivot_table 來完成它,但我一次只能對其中一種方法執行此操作。 有沒有辦法可以在相同的代碼中為這兩種方法做到這一點?
count = df.groupby(['name']).apply(lambda sub_df: sub_df\
.pivot_table(index=['method'], values=['values'],
aggfunc= {'values': lambda x: x[df.iloc[x.index['method']=='estimated'].sum()},
margins=True, margins_name == 'total_estimated')
count
我最終想要得到的是這樣的:
index name method values
0. A estimated 4874
1. A counted 847
2. A estimated 1152
3. A total_counted 847
4. A total_estimated 6026
5. B estimated 276
6. B counted 6542
7. B counted 1152
8. B estimated 3346
9. B total_counted 7694
10. B total_estimated 3622
11. C counted 7622
12. C estimated 26
13. C total_counted 7622
14. C total_estimated 26
...
使用DataFrame.pivot_table
來統計,那么我們可以用DataFrame.stack
+ DataFrame.join
或者DataFrame.melt
+ DataFrame.merge
加入原來的DataFrame.merge
:
#if index is a columns
#df = df = df.set_index('index')
new_df = (df.join(df.pivot_table(index = 'name',
columns = 'method',
values = 'values',
aggfunc = 'sum')
.add_prefix('total_')
.stack()
.rename('new_value'),
on = ['name','method'],how = 'outer')
.assign(values = lambda x: x['values'].fillna(x['new_value']))
.drop(columns = 'new_value')
.sort_values(['name','method'])
)
print(new_df)
或者
#if index is a columns
#df = df = df.set_index('index')
new_df = (df.merge(df.pivot_table(index = 'name',
columns = 'method',
values = 'values',
aggfunc = 'sum')
.add_prefix('total_')
.T
.reset_index()
.melt('method',value_name = 'values'),
on = ['name','method'],how = 'outer')
.assign(values = lambda x: x['values_x'].fillna(x['values_y']))
.loc[:,df.columns]
.sort_values(['name','method'])
)
print(new_df)
輸出
name method values
2 A counted 847.0
0 A estimated 4874.0
1 A estimated 1152.0
9 A total_counted 847.0
10 A total_estimated 6026.0
5 B counted 6542.0
6 B counted 1152.0
3 B estimated 276.0
4 B estimated 3346.0
11 B total_counted 7694.0
12 B total_estimated 3622.0
7 C counted 7622.0
8 C estimated 26.0
13 C total_counted 7622.0
14 C total_estimated 26.0
但如果我是你,我會改用DataFrame.add_suffix
:
new_df = (df.join(df.pivot_table(index = 'name',
columns = 'method',
values = 'values',
aggfunc = 'sum')
.add_suffix('_total')
.stack()
.rename('new_value'),
on = ['name','method'],how = 'outer')
.assign(values = lambda x: x['values'].fillna(x['new_value']))
.drop(columns = 'new_value')
.sort_values(['name','method'])
)
print(new_df)
name method values
index
1.0 A counted 847.0
8.0 A counted_total 847.0
0.0 A estimated 4874.0
2.0 A estimated 1152.0
8.0 A estimated_total 6026.0
4.0 B counted 6542.0
5.0 B counted 1152.0
8.0 B counted_total 7694.0
3.0 B estimated 276.0
6.0 B estimated 3346.0
8.0 B estimated_total 3622.0
7.0 C counted 7622.0
8.0 C counted_total 7622.0
8.0 C estimated 26.0
8.0 C estimated_total 26.0
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.