[英]How to group and aggregate in pandas
我有這樣的df。 此值是投影值,因此存在許多列。
Customer seg value0 value1
A a 10 60
A b 20 50
A c 30 40
B a 40 30
B b 50 20
B c 60 10
我想通過引用seg
列來計算值。
abc ( a minus b minus c)
在每個客戶
customer value0 value1
A -40 -30
B -70 0
如何通過對客戶進行分組來計算每個值。
df.groupby(customer)
謝謝
想法是減去-1
然后聚合sum
多個值:
#filter only a,b,c rows
df1 = df[df['seg'].isin(['a','b','c'])]
a = np.where(df1['seg'].eq('a'), 1, -1)
df1.iloc[:, 2:] *= a[:, None]
print (df1)
Customer seg value0 value1
0 A a 10 60
1 A b -20 -50
2 A c -30 -40
3 B a 40 30
4 B b -50 -20
5 B c -60 -10
df2 = df1.groupby('Customer', as_index=False).sum()
print (df2)
Customer value0 value1
0 A -40 -30
1 B -70 0
或者,如果想要多個數字列:
df1 = df[df['seg'].isin(['a','b','c'])]
c = df1.select_dtypes(np.number).columns
a = np.where(df1['seg'].eq('a'), 1, -1)
df1[c] *= a[:, None]
df2 = df1.groupby('Customer', as_index=False).sum()
print (df2)
Customer value0 value1
0 A -40 -30
1 B -70 0
這個怎么樣:
In [42]: df
Out[42]:
Customer seg value0 value1
0 A a 10 60
1 A b 20 50
2 A c 30 40
3 B a 40 30
4 B b 50 20
5 B c 60 10
In [43]: df.pivot('seg', 'Customer').T.eval('a - b - c').unstack(level=0)
Out[43]:
value0 value1
Customer
A -40 -30
B -70 0
如果您更喜歡groupby
,還有另一種解決方案:
In [44]: df.groupby('Customer').apply(lambda x:
x.set_index('seg')[['value0', 'value1']].T.eval('a - b - c'))
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.