![](/img/trans.png)
[英]python remove rows with the same keys and keep the row with the most recent date stamp
[英]Keep last set of obs within a group with the same (most recent) date
有沒有一種方法可以在一個“組”中只保留最新的觀察結果?
例如,我只想保留每個 PrimaryID-SecondaryID 對的最新觀察結果。
PrimaryID SecondaryID SubAccount Value ReportDate
0 1 A 123 5618.48 2022-01-01
1 1 A 456 8206.23 2022-01-01
2 1 A 123 6722.05 2022-07-01
3 1 A 456 5500.53 2022-07-01
4 1 B 789 8990.75 2022-02-01
5 1 B 987 6294.63 2022-02-01
6 1 B 789 8389.60 2022-03-01
7 1 B 246 343.02 2022-03-01
8 2 X 234 4157.57 2022-02-01
9 2 X 752 8218.00 2022-02-01
10 2 X 234 6430.68 2022-03-01
11 2 X 755 7148.57 2022-03-01
12 2 Y 731 5406.63 2022-05-02
13 2 Y 480 2429.83 2022-05-02
14 2 Y 731 6251.38 2022-06-01
15 2 Y 841 8256.93 2022-06-01
這是實現此目的的一種方法,但似乎很草率。
df['lastRptDt'] = df.groupby(['PrimaryID', 'SecondaryID'])['ReportDate'].transform(max)
df1 = df[(df['ReportDate']==df['lastRptDt'])]
這是所需的輸出:
PrimaryID SecondaryID SubAccount Value ReportDate lastRptDt
2 1 A 123 6722.05 2022-07-01 2022-07-01
3 1 A 456 5500.53 2022-07-01 2022-07-01
6 1 B 789 8389.60 2022-03-01 2022-03-01
7 1 B 246 343.02 2022-03-01 2022-03-01
10 2 X 234 6430.68 2022-03-01 2022-03-01
11 2 X 755 7148.57 2022-03-01 2022-03-01
14 2 Y 731 6251.38 2022-06-01 2022-06-01
15 2 Y 841 8256.93 2022-06-01 2022-06-01
這個怎么樣?
df.set_index(['PrimaryID', 'SecondaryID', 'ReportDate']).loc[:,:,df.groupby(['PrimaryID', 'SecondaryID']).ReportDate.max()]
Out[54]:
SubAccount Value lastRptDt
PrimaryID SecondaryID ReportDate
1 A 2022-07-01 123 6722.05 2022-07-01
2022-07-01 456 5500.53 2022-07-01
B 2022-03-01 789 8389.60 2022-03-01
2022-03-01 246 343.02 2022-03-01
2 X 2022-03-01 234 6430.68 2022-03-01
2022-03-01 755 7148.57 2022-03-01
Y 2022-06-01 731 6251.38 2022-06-01
2022-06-01 841 8256.93 2022-06-01
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.