[英]Pandas how to aggregate more than one column
Here is the snippet: 这是片段:
test = pd.DataFrame({'userid': [1,1,1,2,2], 'order_id': [1,2,3,4,5], 'fee': [2,1,5,3,1]})
I'd like to group based on userid and count the 'order_id' column and sum the 'fee' column: 我想基于userid进行分组并计算'order_id'列并总结'费用'列:
test.groupby('userid').order_id.count()
test.groupby('userid').fee.sum()
Is it possible to perform these two operations in one line of code so that I can get a resulting df looks like this: 是否可以在一行代码中执行这两个操作,以便我可以得到一个结果df如下所示:
userid counts sum
...
I've tried pivot_table: 我试过了pivot_table:
test.pivot_table(index='userid', values=['order_id', 'fee'], aggfunc=[np.size, np.sum])
It gives something like this: 它给出了这样的东西:
size sum
fee order_id fee order_id
userid
1 3 3 8 6
2 2 2 4 9
Is it possible to tell pandas to use np.size & np.sum on one column but not both? 是否可以告诉pandas在一列上使用np.size&np.sum而不是两者都使用?
Use DataFrameGroupBy.agg
with rename
columns: 使用DataFrameGroupBy.agg
rename
列:
d = {'order_id':'counts','fee':'sum'}
df = test.groupby('userid').agg({'order_id':'count', 'fee':'sum'})
.rename(columns=d)
.reset_index()
print (df)
userid sum counts
0 1 8 3
1 2 4 2
But better is aggregate by size
, because count
is used if need exclude NaN
s: 但更好的是按size
聚合,因为如果需要排除NaN
,则使用count
:
df = test.groupby('userid')
.agg({'order_id':'size', 'fee':'sum'})
.rename(columns=d).reset_index()
print (df)
userid sum counts
0 1 8 3
1 2 4 2
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.