繁体   English   中英

Python Pandas:根据某些列比较数据框的行,并删除具有最低值的行

[英]Python pandas: Compare rows of dataframe based on some columns and drop row with lowest value

我有一个数据框df:

       first_seen              last_seen             uri
0   2015-05-11 23:08:46     2015-05-11 23:08:50 http://11i-ssaintandder.com/
1   2015-05-11 23:08:46     2015-05-11 23:08:46 http://11i-ssaintandder.com/
2   2015-05-02 18:27:10     2015-06-06 03:52:03 http://goo.gl/NMqjd1
3   2015-05-02 18:27:10     2015-06-08 08:44:53 http://goo.gl/NMqjd1

我想删除具有相同“ first_seen”,“ uri”的行,并仅保留具有最新last_seen的行。

这是expected数据集的示例:

       first_seen              last_seen             uri
0   2015-05-11 23:08:46     2015-05-11 23:08:50 http://11i-ssaintandder.com/
3   2015-05-02 18:27:10     2015-06-08 08:44:53 http://goo.gl/NMqjd1

没有人知道谁不写for循环就可以做到吗?

调用drop_duplicates并将要考虑进行重复匹配的列作为subset的args并设置参数take_last=True

In [295]:

df.drop_duplicates(subset=['first_seen','uri'], take_last=True)
Out[295]:
  index          first_seen            last_seen                           uri
1     1 2015-05-11 23:08:46  2015-05-11 23:08:46  http://11i-ssaintandder.com/
3     3 2015-05-02 18:27:10  2015-06-08 08:44:53          http://goo.gl/NMqjd1

编辑

为了获取最新日期,您需要先在“ first_seen”和“ last_seen”上对df进行排序:

n [317]:
df = df.sort(columns=['first_seen','last_seen'], ascending=[0,1])
df.drop_duplicates(subset=['first_seen','uri'], take_last=True)

Out[317]:
  index          first_seen            last_seen                           uri
0     0 2015-05-11 23:08:46  2015-05-11 23:08:50  http://11i-ssaintandder.com/
3     3 2015-05-02 18:27:10  2015-06-08 08:44:53          http://goo.gl/NMqjd1

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM