简体   繁体   中英

Checking for overlaps in two long lists of items in Python

I have two lists (list1 and list2) that contain 10 million names of companies. Each list has no duplicates, but some companies appear in both lists. And I want to find what those companies are. I wrote the code below:

list_matched = []
for i in range(len(list1)):
    for j in range(len(list2)):
        if list1[i] == list2[j]:
            list_matched.append(list1[i])

The problem of this code is that it never finishes executing. My question is what I can do to finish this task within a reasonable amount of time. The size of 10 million names seems to be too big to handle.

Use set logic. It is specifically designed for this task.

a = set(list1)
b = set(list2)

companies_in_both = a & b

(This will produce a set as the output. If you need it as a list, just pass the set to list() .)

I'd recommend making a set from one list and checking the other, eg:

inlist1 = set(list1)
list_matched = [x for x in list2 if x in inlist1]

Of course you can do it the other way 'round, depending which list's order (if any) you want to preserve -- this snippet preserves the order of list2 .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM