如何刪除重復項並保留熊貓的最后一個時間戳

Question

我想刪除重復項並保留最后一個時間戳。 要刪除的重復項是customer_id和var_name是我的數據

    customer_id  value   var_name     timestamp
    1            1       apple        2018-03-22 00:00:00.000        
    2            3       apple        2018-03-23 08:00:00.000
    2            4       apple        2018-03-24 08:00:00.000
    1            1       orange       2018-03-22 08:00:00.000
    2            3       orange       2018-03-24 08:00:00.000
    2            5       orange       2018-03-23 08:00:00.000

所以結果將是

    customer_id  value   var_name     timestamp
    1            1       apple        2018-03-22 00:00:00.000        
    2            4       apple        2018-03-24 08:00:00.000
    1            1       orange       2018-03-22 08:00:00.000
    2            3       orange       2018-03-24 08:00:00.000

Answer 1

我認為需要使用sort_values進行drop_duplicates ：

df = df.sort_values('timestamp').drop_duplicates(['customer_id','var_name'], keep='last')
print (df)
   customer_id  value var_name                timestamp
0            1      1    apple  2018-03-22 00:00:00.000
3            1      1   orange  2018-03-22 08:00:00.000
2            2      4    apple  2018-03-24 08:00:00.000
4            2      3   orange  2018-03-24 08:00:00.000

如果不需要排序 - 訂單很重要：

df = df.loc[df.groupby(['customer_id','var_name'], sort=False)['timestamp'].idxmax()]
print (df)
   customer_id  value var_name           timestamp
0            1      1    apple 2018-03-22 00:00:00
2            2      4    apple 2018-03-24 08:00:00
3            1      1   orange 2018-03-22 08:00:00
4            2      3   orange 2018-03-24 08:00:00

Answer 2

非常感謝您的解決方案。 僅供參考，第二個解決方案有點慢。

如何刪除重復項並保留熊貓的最后一個時間戳

問題描述

1 個解決方案

解決方案1
5 已采納 2018-03-22 10:08:19

解決方案2
-2 2022-01-06 09:12:07

如何刪除重復項並保留熊貓的最后一個時間戳

問題描述

1 個解決方案

解決方案1 5 已采納 2018-03-22 10:08:19

解決方案2 -2 2022-01-06 09:12:07

解決方案1
5 已采納 2018-03-22 10:08:19

解決方案2
-2 2022-01-06 09:12:07