简体   繁体   中英

How to group and aggregate in pandas

I have df like this. this value is projected values so that many columns exist.

Customer seg value0  value1   
A         a   10      60
A         b   20      50
A         c   30      40
B         a   40      30
B         b   50      20
B         c   60      10

I would like to calculate value by referring to seg columns.

abc ( a minus b minus c)

in each customers

customer value0 value1 
A        -40     -30   
B        -70      0

How can I calculate each values by grouping customers.

df.groupby(customer)

Thanks

Idea is multiple values for subtract by -1 and then aggregate sum :

#filter only a,b,c rows
df1 = df[df['seg'].isin(['a','b','c'])]

a = np.where(df1['seg'].eq('a'), 1, -1)
df1.iloc[:, 2:] *= a[:, None]

print (df1)
  Customer seg  value0  value1
0        A   a      10      60
1        A   b     -20     -50
2        A   c     -30     -40
3        B   a      40      30
4        B   b     -50     -20
5        B   c     -60     -10

df2 = df1.groupby('Customer', as_index=False).sum()
print (df2)
  Customer  value0  value1
0        A     -40     -30
1        B     -70       0

Or if want multiple by numeric columns:

df1 = df[df['seg'].isin(['a','b','c'])]
c = df1.select_dtypes(np.number).columns

a = np.where(df1['seg'].eq('a'), 1, -1)
df1[c] *= a[:, None]

df2 = df1.groupby('Customer', as_index=False).sum()
print (df2)
  Customer  value0  value1
0        A     -40     -30
1        B     -70       0

How about this:

In [42]: df
Out[42]:
  Customer seg  value0  value1
0        A   a      10      60
1        A   b      20      50
2        A   c      30      40
3        B   a      40      30
4        B   b      50      20
5        B   c      60      10

In [43]: df.pivot('seg', 'Customer').T.eval('a - b - c').unstack(level=0)
Out[43]:
          value0  value1
Customer
A            -40     -30
B            -70       0

If you prefer groupby , there is another solution:

In [44]: df.groupby('Customer').apply(lambda x: 
            x.set_index('seg')[['value0', 'value1']].T.eval('a - b - c'))

Another approach : Use numpy subtract, combined with reduce:

(df.groupby('Customer')
   .agg(value0=('value0',np.subtract.reduce),
        value1=('value1',np.subtract.reduce))
 )


          value0    value1
Customer        
A          -40  -30
B          -70  0

numpy reduce

numpy subtract

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM