[英]How to group and aggregate in pandas
I have df like this.我有这样的df。 this value is projected values so that many columns exist.
此值是投影值,因此存在许多列。
Customer seg value0 value1
A a 10 60
A b 20 50
A c 30 40
B a 40 30
B b 50 20
B c 60 10
I would like to calculate value by referring to seg
columns.我想通过引用
seg
列来计算值。
abc ( a minus b minus c)
in each customers在每个客户
customer value0 value1
A -40 -30
B -70 0
How can I calculate each values by grouping customers.如何通过对客户进行分组来计算每个值。
df.groupby(customer)
Thanks谢谢
Idea is multiple values for subtract by -1
and then aggregate sum
:想法是减去
-1
然后聚合sum
多个值:
#filter only a,b,c rows
df1 = df[df['seg'].isin(['a','b','c'])]
a = np.where(df1['seg'].eq('a'), 1, -1)
df1.iloc[:, 2:] *= a[:, None]
print (df1)
Customer seg value0 value1
0 A a 10 60
1 A b -20 -50
2 A c -30 -40
3 B a 40 30
4 B b -50 -20
5 B c -60 -10
df2 = df1.groupby('Customer', as_index=False).sum()
print (df2)
Customer value0 value1
0 A -40 -30
1 B -70 0
Or if want multiple by numeric columns:或者,如果想要多个数字列:
df1 = df[df['seg'].isin(['a','b','c'])]
c = df1.select_dtypes(np.number).columns
a = np.where(df1['seg'].eq('a'), 1, -1)
df1[c] *= a[:, None]
df2 = df1.groupby('Customer', as_index=False).sum()
print (df2)
Customer value0 value1
0 A -40 -30
1 B -70 0
How about this:这个怎么样:
In [42]: df
Out[42]:
Customer seg value0 value1
0 A a 10 60
1 A b 20 50
2 A c 30 40
3 B a 40 30
4 B b 50 20
5 B c 60 10
In [43]: df.pivot('seg', 'Customer').T.eval('a - b - c').unstack(level=0)
Out[43]:
value0 value1
Customer
A -40 -30
B -70 0
If you prefer groupby
, there is another solution:如果您更喜欢
groupby
,还有另一种解决方案:
In [44]: df.groupby('Customer').apply(lambda x:
x.set_index('seg')[['value0', 'value1']].T.eval('a - b - c'))
Another approach : Use numpy subtract, combined with reduce:另一种方法:使用numpy减法,结合reduce:
(df.groupby('Customer')
.agg(value0=('value0',np.subtract.reduce),
value1=('value1',np.subtract.reduce))
)
value0 value1
Customer
A -40 -30
B -70 0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.