[英]pandas how to aggregate sum on a column depending on values in other columns
I am trying to sum values in a column by groupby
on values in a second column, but meanwhile also considering values on a 3rd column, the df
is like, 我试图对第二列中的值进行
groupby
,但同时也考虑了第三列中的值, df
就像,
id memo amount
1 pos 1.0
1 pos 2.0
1 neg 3.0
2 pos 4.0
2 pos 5.0
2 neg 6.0
2 neg 7.0
I want to group by id
and sum amount
, but each group, if memo
is pos
it is positive and neg
for negative, eg when groupby
1
, the total amount is 0, since -1.0 - 2.0 + 3.0 = 0
. 我想组由
id
和合计amount
,但每一组中,如果memo
是pos
它是正和neg
负,例如,当groupby
1
,总量为0时,由于-1.0 - 2.0 + 3.0 = 0
。
If I do df.groupby('id')['amount'].sum()
, it only considers id
and amount
column, I am wondering how to also take memo
into account here. 如果我执行
df.groupby('id')['amount'].sum()
,则仅考虑id
和amount
列,我想知道如何在此处也考虑memo
。
so the result will look like, 所以结果看起来像
id memo amount total_amount
1 pos 1.0 0.0
1 pos 2.0 0.0
1 neg 3.0 0.0
2 pos 4.0 -4.0
2 pos 5.0 -4.0
2 neg 6.0 -4.0
2 neg 7.0 -4.0
Splitting the operation in two steps, you can achieve what you want through 分两步进行操作,即可实现所需的目标
df['temp'] = np.where(df.memo == 'pos', df.amount, -df.amount)
df['total_amount'] = df.groupby('id').temp.transform(sum)
Another fun way with mapping and multiplying ie 映射和乘法的另一种有趣方式,即
df['new'] = (df.set_index('id')['memo'].map({'pos':1,'neg':-1})*df['amount'].values)\
.groupby(level=0).transform(sum).values
Output : 输出:
id memo amount new
0 1 pos 1.0 0.0
1 1 pos 2.0 0.0
2 1 neg 3.0 0.0
3 2 pos 4.0 -4.0
4 2 pos 5.0 -4.0
5 2 neg 6.0 -4.0
6 2 neg 7.0 -4.0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.