Pandas 如果列包含字符串，则从另一列获取唯一值并从 dataframe 中删除行

Question

I have a small problem.我有一个小问题。 I have this dataframe with 7 columns.我有这个 7 列的 dataframe。 Two of them are 'IP' and 'URL'.其中两个是“IP”和“URL”。

It is a web log data set, so I am trying to get unique IP of rows, where URL contains string "robots.txt" and then if this condition is applied drop the rows of the uniqueIP's from dataframe. It is a web log data set, so I am trying to get unique IP of rows, where URL contains string "robots.txt" and then if this condition is applied drop the rows of the uniqueIP's from dataframe.

I had a hard time trying to solve this.我很难解决这个问题。 I tried pandas groupby but cant solve it still.我试过 pandas groupby 但仍然无法解决。 I am able to get unique ip's where url contains string "robots.txt" in this code:我能够获得唯一的 ip，其中 url 在此代码中包含字符串“robots.txt”：

robots = data2[data2.url.str.contains('robots.txt', regex=True)] 
len(robots[['ip']].drop_duplicates())

But after that I don't know how to drop these rows from dataframe.但在那之后我不知道如何从 dataframe 中删除这些行。 Does someone have some tips?有人有一些提示吗？ Thanks.谢谢。

Here is the sample: https://i.stack.imgur.com/t6q39.png这是示例： https://i.stack.imgur.com/t6q39.png

Dataframe has around 30k rows. Dataframe 有大约 30k 行。 So desired output is to drop all rows from dataframe if string "robots.txt" is in url column.因此，如果字符串“robots.txt”在 url 列中，则希望 output 从 dataframe 中删除所有行。 I can do that but trick is to remember values from column 'ip' when column 'url' contains that particular string and drop rows that are accessed through that particular ip address我可以做到这一点，但诀窍是当“url”列包含特定字符串时记住“ip”列中的值，并删除通过该特定 ip 地址访问的行

Answer 1

Just negate your condition只是否定你的条件

robots_condition = data2.url.str.contains('robots.txt')
no_crawl_ips = data2.loc[robots_condition, 'ip'].unique()
data2 = data2[~robots_condition]

Pandas 如果列包含字符串，则从另一列获取唯一值并从 dataframe 中删除行

问题描述

1 个解决方案

解决方案1
0 2021-03-13 14:09:12

Pandas 如果列包含字符串，则从另一列获取唯一值并从 dataframe 中删除行

问题描述

1 个解决方案

解决方案1 0 2021-03-13 14:09:12

解决方案1
0 2021-03-13 14:09:12