简体   繁体   中英

Keep last set of obs within a group with the same (most recent) date

Is there a one-step way to keep only the latest observations within a "group"?

For example, I want to keep only the most recent observations for each PrimaryID-SecondaryID pair.

    PrimaryID   SecondaryID     SubAccount  Value   ReportDate
0   1   A   123     5618.48     2022-01-01
1   1   A   456     8206.23     2022-01-01
2   1   A   123     6722.05     2022-07-01
3   1   A   456     5500.53     2022-07-01
4   1   B   789     8990.75     2022-02-01
5   1   B   987     6294.63     2022-02-01
6   1   B   789     8389.60     2022-03-01
7   1   B   246     343.02  2022-03-01
8   2   X   234     4157.57     2022-02-01
9   2   X   752     8218.00     2022-02-01
10  2   X   234     6430.68     2022-03-01
11  2   X   755     7148.57     2022-03-01
12  2   Y   731     5406.63     2022-05-02
13  2   Y   480     2429.83     2022-05-02
14  2   Y   731     6251.38     2022-06-01
15  2   Y   841     8256.93     2022-06-01

This is one way to accomplish this, but it seems sloppy.

df['lastRptDt'] = df.groupby(['PrimaryID', 'SecondaryID'])['ReportDate'].transform(max)
df1 = df[(df['ReportDate']==df['lastRptDt'])]

This is the desired output:

    PrimaryID   SecondaryID     SubAccount  Value   ReportDate  lastRptDt
2   1   A   123     6722.05     2022-07-01  2022-07-01
3   1   A   456     5500.53     2022-07-01  2022-07-01
6   1   B   789     8389.60     2022-03-01  2022-03-01
7   1   B   246     343.02  2022-03-01  2022-03-01
10  2   X   234     6430.68     2022-03-01  2022-03-01
11  2   X   755     7148.57     2022-03-01  2022-03-01
14  2   Y   731     6251.38     2022-06-01  2022-06-01
15  2   Y   841     8256.93     2022-06-01  2022-06-01

How about this?

df.set_index(['PrimaryID', 'SecondaryID', 'ReportDate']).loc[:,:,df.groupby(['PrimaryID', 'SecondaryID']).ReportDate.max()]
Out[54]: 
                                  SubAccount    Value  lastRptDt
PrimaryID SecondaryID ReportDate                                
1         A           2022-07-01         123  6722.05 2022-07-01
                      2022-07-01         456  5500.53 2022-07-01
          B           2022-03-01         789  8389.60 2022-03-01
                      2022-03-01         246   343.02 2022-03-01
2         X           2022-03-01         234  6430.68 2022-03-01
                      2022-03-01         755  7148.57 2022-03-01
          Y           2022-06-01         731  6251.38 2022-06-01
                      2022-06-01         841  8256.93 2022-06-01

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM