简体   繁体   English

如何在 Pandas 的数据透视表上应用带有条件的聚合函数?

[英]How to apply aggregate function with a condition on a pivot table in Pandas?

My data frame looks "like" this:我的数据框看起来“像”这样:

index   name     method     values
0.      A       estimated     4874
1.      A       counted        847
2.      A       estimated     1152
3.      B       estimated      276
4.      B       counted       6542
5.      B       counted       1152
6.      B       estimated     3346
7.      C       counted       7622
8.      C       estimated       26
...

What I want to do is to sum for each "name" the totals for "estimated" and "counted" values.我想要做的是为每个“名称”求和“估计”和“计数”值的总数。 I tried to do it with pivot_table like in this code, but I can only do it for one of the methods at a time.我尝试像在这段代码中那样使用 pivot_table 来完成它,但我一次只能对其中一种方法执行此操作。 Is there a way I can do it in the same code for both methods?有没有办法可以在相同的代码中为这两种方法做到这一点?

count = df.groupby(['name']).apply(lambda sub_df: sub_df\
        .pivot_table(index=['method'], values=['values'], 
                     aggfunc= {'values': lambda x: x[df.iloc[x.index['method']=='estimated'].sum()}, 
                     margins=True, margins_name == 'total_estimated')
count

What I want to get in the end is like this:我最终想要得到的是这样的:

index   name     method       values
0.      A       estimated       4874
1.      A       counted          847
2.      A       estimated       1152
3.      A    total_counted       847
4.      A   total_estimated     6026
5.      B       estimated        276
6.      B       counted         6542
7.      B       counted         1152
8.      B       estimated       3346
9.      B    total_counted      7694
10.     B   total_estimated     3622
11.     C       counted         7622
12.     C       estimated         26
13.     C    total_counted      7622
14.     C   total_estimated       26
...

Use DataFrame.pivot_table To count, then we can join the original DataFrame with DataFrame.stack + DataFrame.join or DataFrame.melt + DataFrame.merge :使用DataFrame.pivot_table来统计,那么我们可以用DataFrame.stack + DataFrame.join或者DataFrame.melt + DataFrame.merge加入原来的DataFrame.merge

#if index is a columns
#df = df = df.set_index('index')
new_df = (df.join(df.pivot_table(index = 'name',
                                  columns = 'method',
                                  values = 'values',
                                  aggfunc = 'sum')
                    .add_prefix('total_') 
                    .stack()
                    .rename('new_value'),
                  on = ['name','method'],how = 'outer')

            .assign(values = lambda x: x['values'].fillna(x['new_value']))
            .drop(columns = 'new_value')
            .sort_values(['name','method'])
)
print(new_df)

or或者

#if index is a columns
#df = df = df.set_index('index')
new_df = (df.merge(df.pivot_table(index = 'name',
                                  columns = 'method',
                                  values = 'values',
                                  aggfunc = 'sum')
            .add_prefix('total_')         
            .T
            .reset_index()
            .melt('method',value_name = 'values'),
                   on = ['name','method'],how = 'outer')
            .assign(values = lambda x: x['values_x'].fillna(x['values_y']))
            .loc[:,df.columns]
            .sort_values(['name','method'])
)
print(new_df)

Output输出

   name           method  values
2     A          counted   847.0
0     A        estimated  4874.0
1     A        estimated  1152.0
9     A    total_counted   847.0
10    A  total_estimated  6026.0
5     B          counted  6542.0
6     B          counted  1152.0
3     B        estimated   276.0
4     B        estimated  3346.0
11    B    total_counted  7694.0
12    B  total_estimated  3622.0
7     C          counted  7622.0
8     C        estimated    26.0
13    C    total_counted  7622.0
14    C  total_estimated    26.0

But if I were you I would use DataFrame.add_suffix instead :但如果我是你,我会改用DataFrame.add_suffix

new_df = (df.join(df.pivot_table(index = 'name',
                                  columns = 'method',
                                  values = 'values',
                                  aggfunc = 'sum')
                    .add_suffix('_total') 
                    .stack()
                    .rename('new_value'),
                  on = ['name','method'],how = 'outer')

            .assign(values = lambda x: x['values'].fillna(x['new_value']))
            .drop(columns = 'new_value')
            .sort_values(['name','method'])
         )
print(new_df)

      name           method  values
index                              
1.0      A          counted   847.0
8.0      A    counted_total   847.0
0.0      A        estimated  4874.0
2.0      A        estimated  1152.0
8.0      A  estimated_total  6026.0
4.0      B          counted  6542.0
5.0      B          counted  1152.0
8.0      B    counted_total  7694.0
3.0      B        estimated   276.0
6.0      B        estimated  3346.0
8.0      B  estimated_total  3622.0
7.0      C          counted  7622.0
8.0      C    counted_total  7622.0
8.0      C        estimated    26.0
8.0      C  estimated_total    26.0

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM