简体   繁体   English

检查Python中两个长项目列表的重叠

[英]Checking for overlaps in two long lists of items in Python

I have two lists (list1 and list2) that contain 10 million names of companies. 我有两个列表(list1和list2),其中包含1000万个公司名称。 Each list has no duplicates, but some companies appear in both lists. 每个列表都没有重复,但有些公司出现在两个列表中。 And I want to find what those companies are. 而且我想找到那些公司是什么。 I wrote the code below: 我写了下面的代码:

list_matched = []
for i in range(len(list1)):
    for j in range(len(list2)):
        if list1[i] == list2[j]:
            list_matched.append(list1[i])

The problem of this code is that it never finishes executing. 这段代码的问题在于它永远不会完成执行。 My question is what I can do to finish this task within a reasonable amount of time. 我的问题是我可以在合理的时间内完成这项任务。 The size of 10 million names seems to be too big to handle. 1000万名的大小似乎太大了,无法处理。

Use set logic. 使用设定逻辑。 It is specifically designed for this task. 它专为此任务而设计。

a = set(list1)
b = set(list2)

companies_in_both = a & b

(This will produce a set as the output. If you need it as a list, just pass the set to list() .) (这将生成一个set作为输出。如果您需要它作为列表,只需将集合传递给list() 。)

I'd recommend making a set from one list and checking the other, eg: 我建议从一个列表中创建一个集合并检查另一个列表,例如:

inlist1 = set(list1)
list_matched = [x for x in list2 if x in inlist1]

Of course you can do it the other way 'round, depending which list's order (if any) you want to preserve -- this snippet preserves the order of list2 . 当然,您可以采用另一种方式“轮流”,具体取决于您要保留的列表顺序(如果有) - 此代码段保留了list2的顺序。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM