简体   繁体   中英

Python Pandas : group by in groups by and average, count, median

Suppose I have a dataframe that looks like this

d = {'User' : ['A', 'A', 'B', 'C', 'C', 'C'],
     'time':[1,2,3,4,4,4],
     'state':['CA', 'CA', 'ID', 'OR','OR','OR']}
df = pd.DataFrame(data = d)

Now suppose I want to create new dataframe that takes the average and median of time, grabs the users state, and generate a new column as well that counts the number of times that user appears in the User column, ie

d = {'User' : ['A', 'B', 'C'],
     'avg_time':[1.5,3,4],
     'median_time':[1.5,3,4],
     'state':['CA','ID','OR'],
     'user_count':[2,1,3]}

df_res = pd.DataFrame(data=d)

I know that I can do a group by mean statement like this

df.groupby(['User'], as_index=False).mean().groupby('User')['time'].mean()

This gives me a pandas series, and I assume I can make this into a dataframe if I wanted but how would I do the latter above for all the other columns I am interested in?

Try using pd.NamedAgg :

df.groupby('User').agg(avg_time=('time','mean'),
                       mean_time=('time','median'),
                       state=('state','first'),
                       user_count=('time','count')).reset_index()

Output:

  User  avg_time  mean_time state  user_count
0    A       1.5        1.5    CA           2
1    B       3.0        3.0    ID           1
2    C       4.0        4.0    OR           3

You can even pass multiple aggregate functions for the columns in the form of dictionary, something like this:

out = df.groupby('User').agg({'time': [np.mean, np.median], 'state':['first']})

     time        state
     mean median first
User                  
A     1.5    1.5    CA
B     3.0    3.0    ID
C     4.0    4.0    OR

It gives multi-level columns, you can either drop the level or just join them:

>>> out.columns = ['_'.join(col) for col in out.columns]

      time_mean  time_median state_first
User                                    
A           1.5          1.5          CA
B           3.0          3.0          ID
C           4.0          4.0          OR

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM