[英]How to apply aggregate function with a condition on a pivot table in Pandas?
My data frame looks "like" this:我的数据框看起来“像”这样:
index name method values
0. A estimated 4874
1. A counted 847
2. A estimated 1152
3. B estimated 276
4. B counted 6542
5. B counted 1152
6. B estimated 3346
7. C counted 7622
8. C estimated 26
...
What I want to do is to sum for each "name" the totals for "estimated" and "counted" values.我想要做的是为每个“名称”求和“估计”和“计数”值的总数。 I tried to do it with pivot_table like in this code, but I can only do it for one of the methods at a time.我尝试像在这段代码中那样使用 pivot_table 来完成它,但我一次只能对其中一种方法执行此操作。 Is there a way I can do it in the same code for both methods?有没有办法可以在相同的代码中为这两种方法做到这一点?
count = df.groupby(['name']).apply(lambda sub_df: sub_df\
.pivot_table(index=['method'], values=['values'],
aggfunc= {'values': lambda x: x[df.iloc[x.index['method']=='estimated'].sum()},
margins=True, margins_name == 'total_estimated')
count
What I want to get in the end is like this:我最终想要得到的是这样的:
index name method values
0. A estimated 4874
1. A counted 847
2. A estimated 1152
3. A total_counted 847
4. A total_estimated 6026
5. B estimated 276
6. B counted 6542
7. B counted 1152
8. B estimated 3346
9. B total_counted 7694
10. B total_estimated 3622
11. C counted 7622
12. C estimated 26
13. C total_counted 7622
14. C total_estimated 26
...
Use DataFrame.pivot_table
To count, then we can join the original DataFrame with DataFrame.stack
+ DataFrame.join
or DataFrame.melt
+ DataFrame.merge
:使用DataFrame.pivot_table
来统计,那么我们可以用DataFrame.stack
+ DataFrame.join
或者DataFrame.melt
+ DataFrame.merge
加入原来的DataFrame.merge
:
#if index is a columns
#df = df = df.set_index('index')
new_df = (df.join(df.pivot_table(index = 'name',
columns = 'method',
values = 'values',
aggfunc = 'sum')
.add_prefix('total_')
.stack()
.rename('new_value'),
on = ['name','method'],how = 'outer')
.assign(values = lambda x: x['values'].fillna(x['new_value']))
.drop(columns = 'new_value')
.sort_values(['name','method'])
)
print(new_df)
or或者
#if index is a columns
#df = df = df.set_index('index')
new_df = (df.merge(df.pivot_table(index = 'name',
columns = 'method',
values = 'values',
aggfunc = 'sum')
.add_prefix('total_')
.T
.reset_index()
.melt('method',value_name = 'values'),
on = ['name','method'],how = 'outer')
.assign(values = lambda x: x['values_x'].fillna(x['values_y']))
.loc[:,df.columns]
.sort_values(['name','method'])
)
print(new_df)
Output输出
name method values
2 A counted 847.0
0 A estimated 4874.0
1 A estimated 1152.0
9 A total_counted 847.0
10 A total_estimated 6026.0
5 B counted 6542.0
6 B counted 1152.0
3 B estimated 276.0
4 B estimated 3346.0
11 B total_counted 7694.0
12 B total_estimated 3622.0
7 C counted 7622.0
8 C estimated 26.0
13 C total_counted 7622.0
14 C total_estimated 26.0
But if I were you I would use DataFrame.add_suffix
instead :但如果我是你,我会改用DataFrame.add_suffix
:
new_df = (df.join(df.pivot_table(index = 'name',
columns = 'method',
values = 'values',
aggfunc = 'sum')
.add_suffix('_total')
.stack()
.rename('new_value'),
on = ['name','method'],how = 'outer')
.assign(values = lambda x: x['values'].fillna(x['new_value']))
.drop(columns = 'new_value')
.sort_values(['name','method'])
)
print(new_df)
name method values
index
1.0 A counted 847.0
8.0 A counted_total 847.0
0.0 A estimated 4874.0
2.0 A estimated 1152.0
8.0 A estimated_total 6026.0
4.0 B counted 6542.0
5.0 B counted 1152.0
8.0 B counted_total 7694.0
3.0 B estimated 276.0
6.0 B estimated 3346.0
8.0 B estimated_total 3622.0
7.0 C counted 7622.0
8.0 C counted_total 7622.0
8.0 C estimated 26.0
8.0 C estimated_total 26.0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.