简体   繁体   中英

How to efficiently check if a list is in another list of lists python

I have two lists (listA, listB), each composed of many list of tuples.

Eg

listA = [ [(0,1), (1,2) ... ] , [(5,6), (6,10)] , ... ] # can have 5000 lists, each with 100+ tuples
listB = [...] # about the same structure

I want to loop over each list in listA, if it is not in listB, I append it to listB.

So it is something like this:

for lst in listA:
    if lst not in listB: # membership checking
        listB.append(lst)

I have hundreds of thousands of such tasks to perform and it seems to be really slow when listA and listB get bigger. The membership checking seems to be the bottleneck here. I've tried using a string '0-1' instead of a tuple of ints, but it is not getting any faster. Does anyone know how to optimize the code? Is list membership checking really slow?

Any help is greatly appreciated. Thanks!

------------- EDIT: this is what I end up using -------------

Thank you, guys. Converting nested lists to tuples and using set works! But have to be careful when looping over listA, each nested list also has to be converted to a tuple (but just for membership checking!). I still need to append the nested list as a list to listB. That is:

# first convert listB to a set of tuples
listB_as_set = set([tuple(x) for x in listB]) # O(N)

for lst in listA:
    # convert the nested list to tuple
    lst_tuple = tuple(lst)
    # membership checking
    if lst_tuple in listB_as_set: # now O(1), originally O(N)
        listB.append(lst) # still appending as a list to listB

Assuming both lists have length N, and ignoring the time for converting lst to lst_tuple, and append lst to listB, we got an improvement from O(N2) to O(N) , if I'm not mistaken.

If you would like store values in order to check for their existence, sets are significantly faster. So you can try this,and then use the for loop,it will be faster than list.

listA,listB = set(listA),set(listB)

That is becaus set uses a hash function to map to a bucket. Since Python implementations automatically resize that hash table, the speed can be constant O(1) .

Sets are significantly faster when it comes to determining if an object i in a set , bu slower than lists when it comes to iterating over their contents.


If you're using nested list,you can try

listA = [[(0, 1), (1, 2)], [(5, 6), (6, 10)]]
listA = { tuple(i) for i in listA}

Or

listA = {frozenset(i) for i in listA} 

frozenset type is immutable and hashable,so

frozenset([(0, 1), (1, 2)]) = frozenset([(1,2),(0,1)])

Hope this helps.

The way you are doing it now, it's a O(N^2) operation because of the nature of lists. But if you use sets, it because approximate O(n+m) see here for details: https://wiki.python.org/moin/TimeComplexity

So the approach is

a = set(lista)
b = set(listb)

b.union(lista)

Just three lines of code and much faster too. A good poin raised by AChampion about uhashable lists. In that case

a = set([ tuple(x) for x in listA ])

would work.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM