简体   繁体   English

Pandas value_counts 组总和取决于 groupby.agg 中的另一列 function

[英]Pandas value_counts group sum dependent on another column within a groupby.agg function

I am currently aggregating a set of variables grouped and aggregated by variables var1 and var2 .我目前正在聚合一组由变量var1 and var2分组和聚合的变量。 Let's assume I do have continuous variables var3, var4, var5 where I calculate the mean, std, min, max and median values within aggregated groups easily.假设我确实有连续变量var3, var4, var5 ,我可以在其中轻松计算聚合组内的平均值、标准差、最小值、最大值和中值。 However, I also do have two more variables var6 and var7 where var6 is a categorical variable and continuous var7 variable that shows the size of var6 .但是,我还有另外两个变量var6 and var7 ,其中var6是分类变量,而连续var7变量显示var6的大小。 My dataframe looks as follows:我的 dataframe 如下所示:

var1 var2 var3 var4 var5 var6 var7
1    a    3    8     9   0    125
1    a    4    0     12  0    12 
1    a    12   4     12  2    3
1    b    24   5     1   1    45
2    a    1    19    4   1    76
2    a    2    37    12  1    12
2    c    3    93    156 1    341
2    c    57   1     87  2    73
2    c    42   4     95  2    95
3    b    12   11    0   0    11
3    b    119  0     901 0    5

The first part, where I calculate grouped aggregation is easy as follows (as an example):第一部分,我计算分组聚合很容易如下(例如):

desired_df=my_df.groupby(['var1', 'var2']).agg(
    max_var3=('var3', 'max')
    mean_var4=(var4, 'min'))

What I desire to do on top of this aggregation is to group var6 and var7 and put them as new columns next to the aggregation.我希望在此聚合之上做的是对var6var7进行分组,并将它们作为新列放在聚合旁边。 Below is what I like to do:以下是我喜欢做的事情:

var1 var2 var6_group0_sum var6_group1_sum var6_group2_sum
1    a        137         0               3
1    b        0           0               45
2    a        0           88              0
2    c        0           341             168
3    b        16           0               0

How can I achieve this within grouped aggregation?如何在分组聚合中实现这一点? Any help appreciated.任何帮助表示赞赏。

Get the aggregation without the categorical columns (copied and pasted your code):获取没有分类列的聚合(复制并粘贴您的代码):

left = (df.groupby(['var1', 'var2'])
         .agg(max_var3=('var3', 'max'),
              mean_var4=('var4', 'min'))
        )

Get the aggregation of just the categorical column:仅获取分类列的聚合:

right = (df.groupby(['var1', 'var2', 'var6'])
           .var7
           .sum()
           .unstack(-1, fill_value = 0)
           .rename(columns = lambda col: f'var6_group{col}_sum')
           .rename_axis(columns = None)
          )

Combine the two dataframes, note that they share the same index:合并两个数据框,注意它们共享相同的索引:

pd.concat([left, right], axis = 1)

           max_var3  mean_var4  var6_group0_sum  var6_group1_sum  var6_group2_sum
var1 var2                                                                        
1    a           12          0              137                0                3
     b           24          5                0               45                0
2    a            2         19                0               88                0
     c           57          1                0              341              168
3    b          119          0               16                0                0

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM