检查Python中两个长项目列表的重叠

Question

I have two lists (list1 and list2) that contain 10 million names of companies. 我有两个列表（list1和list2），其中包含1000万个公司名称。 Each list has no duplicates, but some companies appear in both lists. 每个列表都没有重复，但有些公司出现在两个列表中。 And I want to find what those companies are. 而且我想找到那些公司是什么。 I wrote the code below: 我写了下面的代码：

list_matched = []
for i in range(len(list1)):
    for j in range(len(list2)):
        if list1[i] == list2[j]:
            list_matched.append(list1[i])

The problem of this code is that it never finishes executing. 这段代码的问题在于它永远不会完成执行。 My question is what I can do to finish this task within a reasonable amount of time. 我的问题是我可以在合理的时间内完成这项任务。 The size of 10 million names seems to be too big to handle. 1000万名的大小似乎太大了，无法处理。

Answer 1

Use set logic. 使用设定逻辑。 It is specifically designed for this task. 它专为此任务而设计。

a = set(list1)
b = set(list2)

companies_in_both = a & b

(This will produce a set as the output. If you need it as a list, just pass the set to list() .) （这将生成一个set作为输出。如果您需要它作为列表，只需将集合传递给list() 。）

Answer 2

I'd recommend making a set from one list and checking the other, eg: 我建议从一个列表中创建一个集合并检查另一个列表，例如：

inlist1 = set(list1)
list_matched = [x for x in list2 if x in inlist1]

Of course you can do it the other way 'round, depending which list's order (if any) you want to preserve -- this snippet preserves the order of list2 . 当然，您可以采用另一种方式“轮流”，具体取决于您要保留的列表顺序（如果有） - 此代码段保留了list2的顺序。

检查Python中两个长项目列表的重叠

问题描述

2 个解决方案

解决方案1
7 已采纳 2014-12-20 03:36:10

解决方案2
3 2014-12-20 03:44:24

检查Python中两个长项目列表的重叠

问题描述

2 个解决方案

解决方案1 7 已采纳 2014-12-20 03:36:10

解决方案2 3 2014-12-20 03:44:24

解决方案1
7 已采纳 2014-12-20 03:36:10

解决方案2
3 2014-12-20 03:44:24