简体   繁体   English

在列表Python中删除重复项的最快方法

[英]Fastest Way To Remove Duplicates In Lists Python

I have two very large lists and to loop through it once takes at least a second and I need to do it 200,000 times. 我有两个非常大的列表,并且至少需要一秒钟来循环它,我需要做200,000次。 What's the fastest way to remove duplicates in two lists to form one? 删除两个列表中的重复项以形成一个副本的最快方法是什么?

This is the fastest way I can think of: 这是我能想到的最快的方式:

import itertools
output_list = list(set(itertools.chain(first_list, second_list)))

Slight update: As jcd points out, depending on your application, you probably don't need to convert the result back to a list. 轻微更新:正如jcd指出的那样,根据您的应用程序,您可能不需要将结果转换回列表。 Since a set is iterable by itself, you might be able to just use it directly: 由于集合本身是可迭代的,因此您可以直接使用它:

output_set = set(itertools.chain(first_list, second_list))
for item in output_set:
    # do something

Beware though that any solution involving the use of set() will probably reorder the elements in your list, so there's no guarantee that elements will be in any particular order. 请注意,任何涉及使用set()解决方案都可能会重新排序列表中的元素,因此无法保证元素将按任何特定顺序排列。 That said, since you're combining two lists, it's hard to come up with a good reason why you would need a particular ordering over them anyway, so this is probably not something you need to worry about. 也就是说,既然你正在组合两个列表,那么很难找到一个很好的理由为什么你需要对它们进行特定的排序,所以这可能不是你需要担心的事情。

I'd recommend something like this: 我推荐这样的东西:

def combine_lists(list1, list2):
    s = set(list1)
    s.update(list2)
    return list(s)

This eliminates the problem of creating a monster list of the concatenation of the first two. 这消除了创建前两个串联的怪物列表的问题。

Depending on what you're doing with the output, don't bother to convert back to a list. 根据您对输出所做的操作,不要费心转换回列​​表。 If ordering is important, you might need some sort of decorate/sort/undecorate shenanigans around this. 如果订购很重要,你可能需要某种装饰/排序/不合理的恶作剧。

As Daniel states, a set cannot contain duplicate entries - so concatenate the lists: 正如Daniel所说,一个集合不能包含重复的条目 - 所以连接列表:

list1 + list2

Then convert the new list to a set: 然后将新列表转换为集合:

set(list1 + list2)

Then back to a list: 然后回到列表:

list(set(list1 + list2))
result = list(set(list1).union(set(list2)))

That's how I'd do it. 我就是这样做的。 I am not so sure about performance, though, but it is certainly better, than doing it by hand. 不过,我对表现并不是很确定,但它确实比手工操作更好。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM