I have two lists (list1 and list2) that contain 10 million names of companies. Each list has no duplicates, but some companies appear in both lists. And I want to find what those companies are. I wrote the code below:
list_matched = []
for i in range(len(list1)):
for j in range(len(list2)):
if list1[i] == list2[j]:
list_matched.append(list1[i])
The problem of this code is that it never finishes executing. My question is what I can do to finish this task within a reasonable amount of time. The size of 10 million names seems to be too big to handle.
Use set logic. It is specifically designed for this task.
a = set(list1)
b = set(list2)
companies_in_both = a & b
(This will produce a set
as the output. If you need it as a list, just pass the set to list()
.)
I'd recommend making a set from one list and checking the other, eg:
inlist1 = set(list1)
list_matched = [x for x in list2 if x in inlist1]
Of course you can do it the other way 'round, depending which list's order (if any) you want to preserve -- this snippet preserves the order of list2
.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.