Checking for overlaps in two long lists of items in Python

Question

I have two lists (list1 and list2) that contain 10 million names of companies. Each list has no duplicates, but some companies appear in both lists. And I want to find what those companies are. I wrote the code below:

list_matched = []
for i in range(len(list1)):
    for j in range(len(list2)):
        if list1[i] == list2[j]:
            list_matched.append(list1[i])

The problem of this code is that it never finishes executing. My question is what I can do to finish this task within a reasonable amount of time. The size of 10 million names seems to be too big to handle.

Answer 1

Use set logic. It is specifically designed for this task.

a = set(list1)
b = set(list2)

companies_in_both = a & b

(This will produce a set as the output. If you need it as a list, just pass the set to list() .)

Answer 2

I'd recommend making a set from one list and checking the other, eg:

inlist1 = set(list1)
list_matched = [x for x in list2 if x in inlist1]

Of course you can do it the other way 'round, depending which list's order (if any) you want to preserve -- this snippet preserves the order of list2 .

Checking for overlaps in two long lists of items in Python

Question

2 answers

solution1
7 ACCPTED 2014-12-20 03:36:10

solution2
3 2014-12-20 03:44:24

Checking for overlaps in two long lists of items in Python

Question

2 answers

solution1 7 ACCPTED 2014-12-20 03:36:10

solution2 3 2014-12-20 03:44:24

solution1
7 ACCPTED 2014-12-20 03:36:10

solution2
3 2014-12-20 03:44:24