简体   繁体   中英

Fastest way to check if a list is present in a list of lists

I have a list

a=[[1,2,3,4,5,6],[7,8,9,10,11,12]]   

What is the fastest way to check if any list in a is present in another list of lists b , where

b=[[5, 9, 25, 31, 33, 36],[7,8,9,10,11,12],[10, 13, 22, 24, 33, 44]]

If any list in a is present in b, I would like to remove it. I'm currently using this code:

for each in a:
    for item in b:
        if set(each).issubset(item)
            a.remove(each)

This works but is quite slow when working with large lists so was wondering if there's a better way. The above code gives me the following result:

print(a)
[[1, 2, 3, 4, 5, 6]]

I am not worried about order, for example if a list in a is [1,2,3,4,5,6] I want it to be removed if there exist a list [1,2,3,4,5,6] or [3,4,1,6,2,5] etc in list b .

Using a list comprehension with set .

Ex:

a=[[1,2,3,4,5,6],[7,8,9,10,11,12]]  
b=[[5, 9, 25, 31, 33, 36],[7,8,9,10,11,12],[10, 13, 22, 24, 33, 44]]

setA = set(map(tuple, a))
setB = set(map(tuple, b))

print([i for i in setA if i not in setB])

Output:

[(1, 2, 3, 4, 5, 6)]

A functional solution is possible using set.difference :

res = set(map(tuple, a)).difference(set(map(tuple, b)))

[(1, 2, 3, 4, 5, 6)]

Explanation

  1. Since list is not a hashable type, we convert sublists to type tuple , which are immutable and hashable, eg set(map(tuple, a)) .
  2. Then use set.difference to take the difference between the 2 resulting sets.

If you don't care about elements order and frequencies, ie treat lists as unordered sets, then probably your solution is almost the correct one (removing an element while iterating the same list is probably not the best idea) with two serious suboptimalities.

First, you currently convert each b's element into a set once per each element of a. I wonder if Python compiler can optimize the repeated work out, at least you could try to do it on your own.

Next, you don't need to remove elements incorrectly and quadratically to simply filter them out.

faster_b = [frozenset(x) for x in b]

def not_in_b(list):
    l = frozenset(list)
    for x in faster_b:
        if l <= x: return False
    return True

print(list(filter(not_in_b, a)))

This one is probably faster.

$ python3
Python 3.6.5 (default, May 11 2018, 04:00:52) 
[GCC 8.1.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> a=[[1,2,3,4,5,6],[7,8,9,10,11,12]]
>>> b=[[5, 9, 25, 31, 33, 36],[7,8,9,10,11,12],[10, 13, 22, 24, 33, 44]]
>>> faster_b = [frozenset(x) for x in b]
>>> 
>>> def not_in_b(list):
...     l = frozenset(list)
...     for x in faster_b:
...         if l <= x: return False
...     return True
... 
>>> print(list(filter(not_in_b, a)))
[[1, 2, 3, 4, 5, 6]]
>>> a=[[1, 1, 2, 3]]
>>> b=[[7, 3, 2, 1], [4, 5, 6]]
>>> faster_b = [frozenset(x) for x in b]
>>> print(list(filter(not_in_b, a)))
[]
>>> a=[[1, 1, 2, 3], [42, 5, 6]]
>>> print(list(filter(not_in_b, a)))
[[42, 5, 6]]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM