简体   繁体   中英

pandas groupby latest observation for each group

I have a panel dataframe (ID and time) and want to collect the recent (latest) rows for each ID. Here is the table:

df = pd.DataFrame({'ID': [1,1,2,3], 'Year': [2018,2019,2019,2020], 'Var1':list("abcd"), 'Var2': list("efgh")})

在此处输入图像描述

and the end result would be:

在此处输入图像描述

Use tail :

df.groupby("ID").tail(1)

The output is:

   ID  Year Var1 Var2
1   1  2019    b    f
2   2  2019    c    g
3   3  2020    d    h

Another alternative is to use last :

df.groupby("ID").last()

Use drop_duplicates:

df.sort_values('Year').drop_duplicates('ID', keep='last')

Output:

   ID  Year Var1 Var2
1   1  2019    b    f
2   2  2019    c    g
3   3  2020    d    h

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM