简体   繁体   English

Pandas 如果列包含字符串,则从另一列获取唯一值并从 dataframe 中删除行

[英]Pandas if colum contains string then get unique value from another column and drop rows from dataframe

I have a small problem.我有一个小问题。 I have this dataframe with 7 columns.我有这个 7 列的 dataframe。 Two of them are 'IP' and 'URL'.其中两个是“IP”和“URL”。

It is a web log data set, so I am trying to get unique IP of rows, where URL contains string "robots.txt" and then if this condition is applied drop the rows of the uniqueIP's from dataframe. It is a web log data set, so I am trying to get unique IP of rows, where URL contains string "robots.txt" and then if this condition is applied drop the rows of the uniqueIP's from dataframe.

I had a hard time trying to solve this.我很难解决这个问题。 I tried pandas groupby but cant solve it still.我试过 pandas groupby 但仍然无法解决。 I am able to get unique ip's where url contains string "robots.txt" in this code:我能够获得唯一的 ip,其中 url 在此代码中包含字符串“robots.txt”:

robots = data2[data2.url.str.contains('robots.txt', regex=True)] 
len(robots[['ip']].drop_duplicates())

But after that I don't know how to drop these rows from dataframe.但在那之后我不知道如何从 dataframe 中删除这些行。 Does someone have some tips?有人有一些提示吗? Thanks.谢谢。


Here is the sample: https://i.stack.imgur.com/t6q39.png这是示例: https://i.stack.imgur.com/t6q39.png


Dataframe has around 30k rows. Dataframe 有大约 30k 行。 So desired output is to drop all rows from dataframe if string "robots.txt" is in url column.因此,如果字符串“robots.txt”在 url 列中,则希望 output 从 dataframe 中删除所有行。 I can do that but trick is to remember values from column 'ip' when column 'url' contains that particular string and drop rows that are accessed through that particular ip address我可以做到这一点,但诀窍是当“url”列包含特定字符串时记住“ip”列中的值,并删除通过该特定 ip 地址访问的行

Just negate your condition只是否定你的条件

robots_condition = data2.url.str.contains('robots.txt')
no_crawl_ips = data2.loc[robots_condition, 'ip'].unique()
data2 = data2[~robots_condition]

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 从数据框中获取列中唯一值的最后一行 - Pandas - Get the last rows for a unique value in a column from a dataframe - Pandas 如果一列的字符串包含 pandas dataframe 中另一列的单词,如何删除整行 - How to drop entire row if string of one column contains the word from another column in pandas dataframe Python pandas:如果第 2 列不包含“字符串”,则从第 1 列获取唯一 ID - Python pandas: get unique id from column 1 if "string" not contains on column 2 Pandas:如果数据框中的值包含来自另一个数据帧的字符串,则追加列 - Pandas : if value in a dataframe contains string from another dataframe, append columns Python Pandas - 过滤 pandas dataframe 以获取一列中具有最小值的行,以获取另一列中的每个唯一值 - Python Pandas - filter pandas dataframe to get rows with minimum values in one column for each unique value in another column 如何从熊猫数据框中删除行,其中任何列都包含我不想要的符号 - How to drop rows from a pandas dataframe where any column contains a symbol I don't want Pandas Dataframe:从另一列中唯一值最多的列中查找唯一值 - Pandas Dataframe: Find unique value from one column which has the largest number of unique values in another column 从 pandas DataFrame 中删除名称包含特定字符串的第一个(或任何第 n 个)列 - Drop the first (or any nth) column whose name contains a specific string from pandas DataFrame Pandas Dataframe 用另一列的值替换部分字符串 - Pandas Dataframe replace part of string with value from another column 如果pandas系列中的字符串包含另一个pandas数据帧中的字符串 - if string in pandas series contains a string from another pandas dataframe
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM