简体   繁体   中英

How do you group, sort, and limit in Python Pandas? (i.e. Get Top 10)

I have a Pandas dataframe that has columns actor_id and account_id. Actor is a person and account is simply an account. So a person can have more than one account and accounts can have multiple people.

My goal is to group by actor_id and then rank the actor_ids by the number of accounts they have so that I can get a list of the Top 10 actors with the most accounts.

In SQL, it would be something like SELECT actor_id, account_id, COUNT(account_id) GROUP BY actor_id LIMIT 10. But I am Trying to do this in Python.

I referenced this Pandas group and sort by index count but it did not work for me. Below is the code I've tried.

df['count'] = df['actor_id'].map(df['account_id'].value_counts())
df.sort_index('count', ascending=False)

The dataset looks like: 数据集

In the picture, replace project_id with account_id.

You can do:

df_nb_acc = (
    df.groupby('actor_id')['account_id'] #groupby actor_id, select the column account_id
      .count() # count the number of accout per actor
      .reset_index() # actor_id become a column and not indexes 
      .rename(columns={'account_id':'Nb_account'}) # to rename the column
      .sort_values('Nb_account',axis=1, ascending=False)
      # to sort the value on the column Nb_account, largest to smallest
    )

And to get the top 10 then do df_nb_acc.head(10)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM