简体   繁体   中英

How to group by key and only return observation by max from pandas dataframe

How does one return all observations from a dataframe that only hold the max value for each unique key associated? I tried groupby and max but for every difference I get more than on value per key returned back. See example below:

import pandas as pd
key = [111,111,222,333,444,555]
flag = [0,1,1,1,1,1]
date = [pd.to_datetime('2020-01-01'),pd.to_datetime('2020-01-02'), 
        pd.to_datetime('2020-02-01'),pd.to_datetime('2020-04-01'),
        pd.to_datetime('2020-03-01'),pd.to_datetime('2020-05-01')]

df_dic = {'key':key, 'flag':flag, 'date':date}
df = pd.DataFrame(df_dic)
df
df.groupby(['key', 'flag']).agg({'date':'max'}).reset_index()

This returns:

key  flag       date
0  111     0 2020-01-01
1  111     1 2020-01-02
2  222     1 2020-02-01
3  333     1 2020-04-01

I want to return only the observations with the max date per unique key like so:

key  flag       date
0  111     1 2020-01-02
1  222     1 2020-02-01
2  333     1 2020-04-01

One possible solution (although maybe not the best one) is to use an inner join after the groupby method. This ensures that the flags are taken from the original dataframe, either they are 0 or 1.

df.groupby(['key'])\
    .agg({'date': 'max'})\
    .reset_index()\
    .merge(df, how='inner', on=['key', 'date'])

#    key       date  flag
# 0  111 2020-01-02     1
# 1  222 2020-02-01     1
# 2  333 2020-04-01     1
# 3  444 2020-03-01     1
# 4  555 2020-05-01     1

you can avoid groupby and use sort_values and drop_dplicates .

print(df.sort_values('date').drop_duplicates('key', keep='last'))
   key  flag       date
1  111     1 2020-01-02
2  222     1 2020-02-01
4  444     1 2020-03-01
3  333     1 2020-04-01
5  555     1 2020-05-01

Note that you can add the key column in the sort_values if the order matters to you in the result df.sort_values(['key', 'date'])...

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM