将最后一组 obs 保留在具有相同（最近）日期的组中

Question

Is there a one-step way to keep only the latest observations within a "group"?有没有一种方法可以在一个“组”中只保留最新的观察结果？

For example, I want to keep only the most recent observations for each PrimaryID-SecondaryID pair.例如，我只想保留每个 PrimaryID-SecondaryID 对的最新观察结果。

    PrimaryID   SecondaryID     SubAccount  Value   ReportDate
0   1   A   123     5618.48     2022-01-01
1   1   A   456     8206.23     2022-01-01
2   1   A   123     6722.05     2022-07-01
3   1   A   456     5500.53     2022-07-01
4   1   B   789     8990.75     2022-02-01
5   1   B   987     6294.63     2022-02-01
6   1   B   789     8389.60     2022-03-01
7   1   B   246     343.02  2022-03-01
8   2   X   234     4157.57     2022-02-01
9   2   X   752     8218.00     2022-02-01
10  2   X   234     6430.68     2022-03-01
11  2   X   755     7148.57     2022-03-01
12  2   Y   731     5406.63     2022-05-02
13  2   Y   480     2429.83     2022-05-02
14  2   Y   731     6251.38     2022-06-01
15  2   Y   841     8256.93     2022-06-01

This is one way to accomplish this, but it seems sloppy.这是实现此目的的一种方法，但似乎很草率。

df['lastRptDt'] = df.groupby(['PrimaryID', 'SecondaryID'])['ReportDate'].transform(max)
df1 = df[(df['ReportDate']==df['lastRptDt'])]

This is the desired output:这是所需的输出：

    PrimaryID   SecondaryID     SubAccount  Value   ReportDate  lastRptDt
2   1   A   123     6722.05     2022-07-01  2022-07-01
3   1   A   456     5500.53     2022-07-01  2022-07-01
6   1   B   789     8389.60     2022-03-01  2022-03-01
7   1   B   246     343.02  2022-03-01  2022-03-01
10  2   X   234     6430.68     2022-03-01  2022-03-01
11  2   X   755     7148.57     2022-03-01  2022-03-01
14  2   Y   731     6251.38     2022-06-01  2022-06-01
15  2   Y   841     8256.93     2022-06-01  2022-06-01

Answer 1

How about this?这个怎么样？

df.set_index(['PrimaryID', 'SecondaryID', 'ReportDate']).loc[:,:,df.groupby(['PrimaryID', 'SecondaryID']).ReportDate.max()]

Out[54]: 
                                  SubAccount    Value  lastRptDt
PrimaryID SecondaryID ReportDate                                
1         A           2022-07-01         123  6722.05 2022-07-01
                      2022-07-01         456  5500.53 2022-07-01
          B           2022-03-01         789  8389.60 2022-03-01
                      2022-03-01         246   343.02 2022-03-01
2         X           2022-03-01         234  6430.68 2022-03-01
                      2022-03-01         755  7148.57 2022-03-01
          Y           2022-06-01         731  6251.38 2022-06-01
                      2022-06-01         841  8256.93 2022-06-01

将最后一组 obs 保留在具有相同（最近）日期的组中

问题描述

1 个解决方案

解决方案1
0 2022-07-01 01:05:57

将最后一组 obs 保留在具有相同（最近）日期的组中

问题描述

1 个解决方案

解决方案1 0 2022-07-01 01:05:57

解决方案1
0 2022-07-01 01:05:57