简体   繁体   English

如何在熊猫中分组和聚合

[英]How to group and aggregate in pandas

I have df like this.我有这样的df。 this value is projected values so that many columns exist.此值是投影值,因此存在许多列。

Customer seg value0  value1   
A         a   10      60
A         b   20      50
A         c   30      40
B         a   40      30
B         b   50      20
B         c   60      10

I would like to calculate value by referring to seg columns.我想通过引用seg列来计算值。

abc ( a minus b minus c)

in each customers在每个客户

customer value0 value1 
A        -40     -30   
B        -70      0

How can I calculate each values by grouping customers.如何通过对客户进行分组来计算每个值。

df.groupby(customer)

Thanks谢谢

Idea is multiple values for subtract by -1 and then aggregate sum :想法是减去-1然后聚合sum多个值:

#filter only a,b,c rows
df1 = df[df['seg'].isin(['a','b','c'])]

a = np.where(df1['seg'].eq('a'), 1, -1)
df1.iloc[:, 2:] *= a[:, None]

print (df1)
  Customer seg  value0  value1
0        A   a      10      60
1        A   b     -20     -50
2        A   c     -30     -40
3        B   a      40      30
4        B   b     -50     -20
5        B   c     -60     -10

df2 = df1.groupby('Customer', as_index=False).sum()
print (df2)
  Customer  value0  value1
0        A     -40     -30
1        B     -70       0

Or if want multiple by numeric columns:或者,如果想要多个数字列:

df1 = df[df['seg'].isin(['a','b','c'])]
c = df1.select_dtypes(np.number).columns

a = np.where(df1['seg'].eq('a'), 1, -1)
df1[c] *= a[:, None]

df2 = df1.groupby('Customer', as_index=False).sum()
print (df2)
  Customer  value0  value1
0        A     -40     -30
1        B     -70       0

How about this:这个怎么样:

In [42]: df
Out[42]:
  Customer seg  value0  value1
0        A   a      10      60
1        A   b      20      50
2        A   c      30      40
3        B   a      40      30
4        B   b      50      20
5        B   c      60      10

In [43]: df.pivot('seg', 'Customer').T.eval('a - b - c').unstack(level=0)
Out[43]:
          value0  value1
Customer
A            -40     -30
B            -70       0

If you prefer groupby , there is another solution:如果您更喜欢groupby ,还有另一种解决方案:

In [44]: df.groupby('Customer').apply(lambda x: 
            x.set_index('seg')[['value0', 'value1']].T.eval('a - b - c'))

Another approach : Use numpy subtract, combined with reduce:另一种方法:使用numpy减法,结合reduce:

(df.groupby('Customer')
   .agg(value0=('value0',np.subtract.reduce),
        value1=('value1',np.subtract.reduce))
 )


          value0    value1
Customer        
A          -40  -30
B          -70  0

numpy reduce 麻木减少

numpy subtract numpy 减法

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM