简体   繁体   中英

Faster way to perform sort by index on pandas group

I have a dataframe with name(person_name), color(shirt_color) as columns Each person wears a shirt with a certain color on a particular day (number of days can be arbitrary)

eg input:

name    color
----------------
John    White
John    White
John    Blue
John    Blue
John    White
Tom     Blue
Tom     Blue
Tom     Green
Tom     Black
Jerry   Black
Jerry   Blue
Jerry   Black

I need to find the best colored shirt worn by each person, according to best_color_order eg result:

name    color
-------------
Jerry   Blue
John    White
Tom     Green

I am doing the following :

best_color_order = ['White', 'Green', 'Blue', 'Black']

best_color_list = [[name, list(group['color']).sort(key=best_color_order.index)[0]]
                    for name, group in df.groupby(by='name', sort=False, as_index=False)]

best_color_df = pd.DataFrame(best_color_list, columns=['name', 'color'])

Is there a faster way to do this if I have millions of records?

Convert the column color to an ordered categorical so it will sort in your desired order, then sort the values by color , and finally groupby and take the first value for each name:

best_color_order = ['White', 'Green', 'Blue', 'Black']

df['color'] = pd.Categorical(df['color'], categories = best_color_order, ordered=True)

df.sort_values('color').groupby('name').first()

       color
name        
Jerry   Blue
John   White
Tom    Green

[EDIT] : A faster way might be to do the same, but instead of groupby, just drop the duplicate name s and keep the first (which is the default for the function drop_duplicates ):

best_color_order = ['White', 'Green', 'Blue', 'Black']

df['color'] = pd.Categorical(df['color'], categories = best_color_order, ordered=True)

df.sort_values('color').drop_duplicates('name')

     name  color
0    John  White
7     Tom  Green
10  Jerry   Blue

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM