简体   繁体   English

检查元素列表是否在 DataFrame 列中

[英]Check if elements list are in column DataFrame

Objective : I have a list of 200 elements(urls) and I would like to check if each one is in a specific column of the Dataframe .目标:我有 200 个元素(url)的列表,我想检查每个元素是否在Dataframe的特定列中。 If it is, I would like to remove the element from the list.如果是,我想从列表中删除该元素。

Problem : I am trying a similar solution by adding to a new list the ones that are not there but it adds all of them.问题:我正在尝试类似的解决方案,将那些不存在的列表添加到新列表中,但它会添加所有这些。

pruned = []
for element in list1:
    if element not in transfer_history['Link']:
        pruned.append(element)

I have also tried the solution I asked for without success.我也尝试过我要求的解决方案,但没有成功。 I think it's a simple thing but I can't find the key.我认为这是一件简单的事情,但我找不到关键。

for element in list1:
    if element in transfer_history['Link']:
        list1.remove(element)

When you use in with a pandas series, you are searching the index, not the values .当您与 pandas 系列一起使用in您正在搜索索引,而不是值 To get around this, convert the column to a list using transfer_history['Link'].tolist() , or better, convert it to a set.要解决此问题,请使用transfer_history['Link'].tolist()将列转换为列表,或者更好地将其转换为集合。

links = set(transfer_history["Link"])

A good way to filter the list is like this:过滤列表的好方法是这样的:

pruned = [element for element in list1 if element not in links]

Don't remove elements from the list while iterating over it , which may have unexpected results. 迭代时不要从列表中删除元素,这可能会产生意想不到的结果。

Remember, your syntax for transfer_history['Link'] is the entire column itself.请记住,您的transfer_history['Link']语法是整个列本身。 You need to call each item in the column using another array transfer_history['Link'][x] .您需要使用另一个数组transfer_history['Link'][x]来调用列中的每个项目。 Use a for loop to iterate through each item in the column.使用 for 循环遍历列中的每个项目。

Or a much easier way is to just check if the item is in a list made of the entire column with a one liner:或者更简单的方法是检查该项目是否在一个由整个列组成的列表中,并带有一个衬里:

pruned = []
for element in list1:
    if element not in [link for link in transfer_history['Link']]:
        pruned.append(element)

If the order of the urls doesn't matter, this can be simplified a lot using sets:如果 url 的顺序无关紧要,可以使用集合来简化很多:

list1 = list(set(list1) - set(transfer_history['Link']))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM