熊猫如何聚合多个列

Question

Here is the snippet: 这是片段：

test = pd.DataFrame({'userid': [1,1,1,2,2], 'order_id': [1,2,3,4,5], 'fee': [2,1,5,3,1]})

I'd like to group based on userid and count the 'order_id' column and sum the 'fee' column: 我想基于userid进行分组并计算'order_id'列并总结'费用'列：

test.groupby('userid').order_id.count()
test.groupby('userid').fee.sum()

Is it possible to perform these two operations in one line of code so that I can get a resulting df looks like this： 是否可以在一行代码中执行这两个操作，以便我可以得到一个结果df如下所示：

userid    counts    sum
...

I've tried pivot_table: 我试过了pivot_table：

test.pivot_table(index='userid', values=['order_id', 'fee'], aggfunc=[np.size, np.sum])

It gives something like this: 它给出了这样的东西：

       size             sum
       fee  order_id    fee order_id
userid              
1       3      3          8 6
2       2      2          4 9

Is it possible to tell pandas to use np.size & np.sum on one column but not both? 是否可以告诉pandas在一列上使用np.size＆np.sum而不是两者都使用？

Answer 1

Use DataFrameGroupBy.agg with rename columns: 使用DataFrameGroupBy.agg rename列：

d = {'order_id':'counts','fee':'sum'}
df = test.groupby('userid').agg({'order_id':'count', 'fee':'sum'})
         .rename(columns=d)
         .reset_index()
print (df)
   userid  sum  counts
0       1    8       3
1       2    4       2

But better is aggregate by size , because count is used if need exclude NaN s: 但更好的是按size聚合，因为如果需要排除NaN ，则使用count ：

df = test.groupby('userid')
         .agg({'order_id':'size', 'fee':'sum'})
         .rename(columns=d).reset_index()
print (df)
   userid  sum  counts
0       1    8       3
1       2    4       2

熊猫如何聚合多个列

问题描述

1 个解决方案

解决方案1
3 已采纳 2017-08-31 08:43:04

熊猫如何聚合多个列

问题描述

1 个解决方案

解决方案1 3 已采纳 2017-08-31 08:43:04

解决方案1
3 已采纳 2017-08-31 08:43:04