简体   繁体   中英

Pandas how to aggregate more than one column

Here is the snippet:

test = pd.DataFrame({'userid': [1,1,1,2,2], 'order_id': [1,2,3,4,5], 'fee': [2,1,5,3,1]})

I'd like to group based on userid and count the 'order_id' column and sum the 'fee' column:

test.groupby('userid').order_id.count()
test.groupby('userid').fee.sum()

Is it possible to perform these two operations in one line of code so that I can get a resulting df looks like this:

userid    counts    sum
...

I've tried pivot_table:

test.pivot_table(index='userid', values=['order_id', 'fee'], aggfunc=[np.size, np.sum])

It gives something like this:

       size             sum
       fee  order_id    fee order_id
userid              
1       3      3          8 6
2       2      2          4 9

Is it possible to tell pandas to use np.size & np.sum on one column but not both?

Use DataFrameGroupBy.agg with rename columns:

d = {'order_id':'counts','fee':'sum'}
df = test.groupby('userid').agg({'order_id':'count', 'fee':'sum'})
         .rename(columns=d)
         .reset_index()
print (df)
   userid  sum  counts
0       1    8       3
1       2    4       2

But better is aggregate by size , because count is used if need exclude NaN s:

df = test.groupby('userid')
         .agg({'order_id':'size', 'fee':'sum'})
         .rename(columns=d).reset_index()
print (df)
   userid  sum  counts
0       1    8       3
1       2    4       2

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM