[英]How to group and aggregate in pandas
我有这样的df。 此值是投影值,因此存在许多列。
Customer seg value0 value1
A a 10 60
A b 20 50
A c 30 40
B a 40 30
B b 50 20
B c 60 10
我想通过引用seg
列来计算值。
abc ( a minus b minus c)
在每个客户
customer value0 value1
A -40 -30
B -70 0
如何通过对客户进行分组来计算每个值。
df.groupby(customer)
谢谢
想法是减去-1
然后聚合sum
多个值:
#filter only a,b,c rows
df1 = df[df['seg'].isin(['a','b','c'])]
a = np.where(df1['seg'].eq('a'), 1, -1)
df1.iloc[:, 2:] *= a[:, None]
print (df1)
Customer seg value0 value1
0 A a 10 60
1 A b -20 -50
2 A c -30 -40
3 B a 40 30
4 B b -50 -20
5 B c -60 -10
df2 = df1.groupby('Customer', as_index=False).sum()
print (df2)
Customer value0 value1
0 A -40 -30
1 B -70 0
或者,如果想要多个数字列:
df1 = df[df['seg'].isin(['a','b','c'])]
c = df1.select_dtypes(np.number).columns
a = np.where(df1['seg'].eq('a'), 1, -1)
df1[c] *= a[:, None]
df2 = df1.groupby('Customer', as_index=False).sum()
print (df2)
Customer value0 value1
0 A -40 -30
1 B -70 0
这个怎么样:
In [42]: df
Out[42]:
Customer seg value0 value1
0 A a 10 60
1 A b 20 50
2 A c 30 40
3 B a 40 30
4 B b 50 20
5 B c 60 10
In [43]: df.pivot('seg', 'Customer').T.eval('a - b - c').unstack(level=0)
Out[43]:
value0 value1
Customer
A -40 -30
B -70 0
如果您更喜欢groupby
,还有另一种解决方案:
In [44]: df.groupby('Customer').apply(lambda x:
x.set_index('seg')[['value0', 'value1']].T.eval('a - b - c'))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.