简体   繁体   English

熊猫如何聚合多个列

[英]Pandas how to aggregate more than one column

Here is the snippet: 这是片段:

test = pd.DataFrame({'userid': [1,1,1,2,2], 'order_id': [1,2,3,4,5], 'fee': [2,1,5,3,1]})

I'd like to group based on userid and count the 'order_id' column and sum the 'fee' column: 我想基于userid进行分组并计算'order_id'列并总结'费用'列:

test.groupby('userid').order_id.count()
test.groupby('userid').fee.sum()

Is it possible to perform these two operations in one line of code so that I can get a resulting df looks like this: 是否可以在一行代码中执行这两个操作,以便我可以得到一个结果df如下所示:

userid    counts    sum
...

I've tried pivot_table: 我试过了pivot_table:

test.pivot_table(index='userid', values=['order_id', 'fee'], aggfunc=[np.size, np.sum])

It gives something like this: 它给出了这样的东西:

       size             sum
       fee  order_id    fee order_id
userid              
1       3      3          8 6
2       2      2          4 9

Is it possible to tell pandas to use np.size & np.sum on one column but not both? 是否可以告诉pandas在一列上使用np.size&np.sum而不是两者都使用?

Use DataFrameGroupBy.agg with rename columns: 使用DataFrameGroupBy.agg rename列:

d = {'order_id':'counts','fee':'sum'}
df = test.groupby('userid').agg({'order_id':'count', 'fee':'sum'})
         .rename(columns=d)
         .reset_index()
print (df)
   userid  sum  counts
0       1    8       3
1       2    4       2

But better is aggregate by size , because count is used if need exclude NaN s: 但更好的是按size聚合,因为如果需要排除NaN ,则使用count

df = test.groupby('userid')
         .agg({'order_id':'size', 'fee':'sum'})
         .rename(columns=d).reset_index()
print (df)
   userid  sum  counts
0       1    8       3
1       2    4       2

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM