I have a list that contains a number of sublists. For example:
full_list = [[1, 1, 3, 4], [3, 99, 5, 2],[2, 4, 4], [3, 4, 5, 2, 60]]
I also have another list, called omit. For example:
omit = [99, 60, 98]
I want to remove the sublists inside of full_list, if any element in that sublist is in the omit list. For example, I would want the resulting list to be:
reduced_list = [[1, 1, 3, 4], [2, 4, 4]]
because only these sublists do not have an element that is in the omit list.
I am guessing that there is some easy way to pull this off with a list comprehension but I cannot get it to work. I have tried a bunch of things: For example:
reduced_list = [sublist for sublist in full_list if item for sublist not in omit]
Any help would be much appreciated!
ps, The above is a simplified problem. My end goal is to remove sublists from a very long list (eg, 500,000 sublists) of strings if any element (a string) of those sublists is in an "omit" list contain over 2000 strings.
Use set
and all()
:
>>> omit = {99, 60, 98}
>>> full_list = [[1, 1, 3, 4], [3, 99, 5, 2],[2, 4, 4], [3, 4, 5, 2, 60]]
>>> [item for item in full_list if all(x not in omit for x in item)]
[[1, 1, 3, 4], [2, 4, 4]]
Main difference between this method and @alecxe's(or @Óscar López's) solution is that it all
short-circuits and doesn't create any set or list in the memory while set-intersection returns a new set that contains all items that are common with omit
set and it's length is checked to determine whether any item was common or not.(set-intersection happens internally at C speed so it is faster than normal python loops used in all
)
>>> import random
No items intersect:
>>> omit = set(random.randrange(1, 10**18) for _ in xrange(100000))
>>> full_list = [[random.randrange(10**19, 10**100) for _ in xrange(100)] for _ in xrange(1000)]
>>> %timeit [item for item in full_list if not omit & set(item)]
10 loops, best of 3: 43.3 ms per loop
>>> %timeit [x for x in full_list if not omit.intersection(x)]
10 loops, best of 3: 28 ms per loop
>>> %timeit [item for item in full_list if all(x not in omit for x in item)]
10 loops, best of 3: 65.3 ms per loop
All items intersect:
>>> full_list = [range(10**3) for _ in xrange(1000)]
>>> omit = set(xrange(10**3))
>>> %timeit [item for item in full_list if not omit & set(item)]
1 loops, best of 3: 148 ms per loop
>>> %timeit [x for x in full_list if not omit.intersection(x)]
1 loops, best of 3: 108 ms per loop
>>> %timeit [item for item in full_list if all(x not in omit for x in item)]
100 loops, best of 3: 1.62 ms per loop
Some items intersect:
>>> omit = set(xrange(1000, 10000))
>>> full_list = [range(2000) for _ in xrange(1000)]
>>> %timeit [item for item in full_list if not omit & set(item)]
1 loops, best of 3: 282 ms per loop
>>> %timeit [x for x in full_list if not omit.intersection(x)]
1 loops, best of 3: 159 ms per loop
>>> %timeit [item for item in full_list if all(x not in omit for x in item)]
1 loops, best of 3: 227 ms per loop
Try this:
full_list = [[1, 1, 3, 4], [3, 99, 5, 2], [2, 4, 4], [3, 4, 5, 2, 60]]
omit = frozenset([99, 60, 98])
reduced_list = [x for x in full_list if not omit.intersection(x)]
The only change that I made to the input data is that omit
is now a set, for efficiency reasons, as it will allow us to perform a fast intersection (it's frozen because we're not going to modify it), notice that x
doesn't have to be a set. Now the reduced_list
variable will contain the expected value:
reduced_list
=> [[1, 1, 3, 4], [2, 4, 4]]
Make omit
a set, check for intersection on each step of iteration:
>>> full_list = [[1, 1, 3, 4], [3, 99, 5, 2],[2, 4, 4], [3, 4, 5, 2, 60]]
>>> omit = [99, 60, 98]
>>> omit = set(omit) # or just omit = {99, 60, 98} for python >= 2.7
>>> [item for item in full_list if not omit & set(item)]
[[1, 1, 3, 4], [2, 4, 4]]
FYI, better use a frozenset
instead of a set as @Óscar López suggested. With frozenset
it runs a bit faster:
import timeit
def omit_it(full_list, omit):
return [item for item in full_list if not omit & set(item)]
print timeit.Timer('omit_it([[1, 1, 3, 4], [3, 99, 5, 2],[2, 4, 4], [3, 4, 5, 2, 60]], {99, 60, 98})',
'from __main__ import omit_it').timeit(10000)
print timeit.Timer('omit_it([[1, 1, 3, 4], [3, 99, 5, 2],[2, 4, 4], [3, 4, 5, 2, 60]], frozenset([99, 60, 98]))',
'from __main__ import omit_it').timeit(10000)
prints:
0.0334849357605
0.0319349765778
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.