Python Pandas：根據某些列比較數據框的行，並刪除具有最低值的行

Question

我有一個數據框df：

       first_seen              last_seen             uri
0   2015-05-11 23:08:46     2015-05-11 23:08:50 http://11i-ssaintandder.com/
1   2015-05-11 23:08:46     2015-05-11 23:08:46 http://11i-ssaintandder.com/
2   2015-05-02 18:27:10     2015-06-06 03:52:03 http://goo.gl/NMqjd1
3   2015-05-02 18:27:10     2015-06-08 08:44:53 http://goo.gl/NMqjd1

我想刪除具有相同“ first_seen”，“ uri”的行，並僅保留具有最新last_seen的行。

這是expected數據集的示例：

       first_seen              last_seen             uri
0   2015-05-11 23:08:46     2015-05-11 23:08:50 http://11i-ssaintandder.com/
3   2015-05-02 18:27:10     2015-06-08 08:44:53 http://goo.gl/NMqjd1

沒有人知道誰不寫for循環就可以做到嗎？

Answer 1

調用drop_duplicates並將要考慮進行重復匹配的列作為subset的args並設置參數take_last=True ：

In [295]:

df.drop_duplicates(subset=['first_seen','uri'], take_last=True)
Out[295]:
  index          first_seen            last_seen                           uri
1     1 2015-05-11 23:08:46  2015-05-11 23:08:46  http://11i-ssaintandder.com/
3     3 2015-05-02 18:27:10  2015-06-08 08:44:53          http://goo.gl/NMqjd1

編輯

為了獲取最新日期，您需要先在“ first_seen”和“ last_seen”上對df進行排序：

n [317]:
df = df.sort(columns=['first_seen','last_seen'], ascending=[0,1])
df.drop_duplicates(subset=['first_seen','uri'], take_last=True)

Out[317]:
  index          first_seen            last_seen                           uri
0     0 2015-05-11 23:08:46  2015-05-11 23:08:50  http://11i-ssaintandder.com/
3     3 2015-05-02 18:27:10  2015-06-08 08:44:53          http://goo.gl/NMqjd1

Python Pandas：根據某些列比較數據框的行，並刪除具有最低值的行

問題描述

1 個解決方案

解決方案1
1 已采納 2015-06-26 14:48:52

Python Pandas：根據某些列比較數據框的行，並刪除具有最低值的行

問題描述

1 個解決方案

解決方案1 1 已采納 2015-06-26 14:48:52

解決方案1
1 已采納 2015-06-26 14:48:52