简体   繁体   中英

Filter out a list from list of lists in python

I am new to python and I am trying to compare elements of a list with elements of list of lists.

I have a list and a list of lists with different combination of same courses (each list is a different topological sort)

The list_of_lists in some cases is huge. What I want to do is to compare the courses_taken list with the list_of_lists and take as a result the non-common elements of each list, for example:

# the small list:
courses_taken = ['CS350','CS450']

# a list of lists:
list_of_lists =[['CS450', 'CS350', 'CS300', 'CS206', 'CS306'], ['CS450', 'CS350', 'CS206', 'CS306', 'CS300'], ['CS450', 'CS350', 'CS206', 'CS300', 'CS306'],...]

# the result:
result = [['CS300', 'CS206', 'CS306'], ['CS206', 'CS306', 'CS300'], [ 'CS206', 'CS300', 'CS306']]

From the research I did I found only ways to compare the courses_taken with a sublist and not each specific elements in order to return the non-common ones. Also, I found ways to compare 2 lists but the same code would not work for this case.

You can create a set with the courses_taken for faster in operations - it would matter if courses taken were a long list.

Then just iterate over your list of lists and build a new list that checks for set inclusion.

>>> ctset = set(courses_taken)
>>> result = [[item for item in li if item not in ctset] for li in list_of_lists]
>>>
>>> # Or if it really matters, it can be a one-liner.
>>> result = [[item for item in li if item not in set(courses_taken)] for li in list_of_lists]

To demonstrate the difference between checking a list for group membership versus a set , we can set up a couple timeit tests.

>>> from random import randint
>>> import timeit
>>> 
>>> li = list(range(5000))
>>> se = set(li)
>>> 
>>> timeit.timeit("randint(0, 5000) in li", globals=globals(), number=10**6)
33.735417196992785
>>> timeit.timeit("randint(0, 5000) in se", globals=globals(), number=10**6)
1.196909729973413
>>> 

In this case, the set operations were over 30x faster.

This demonstrates a case where the time-complexity of operations on two different data types comes into play. Checking a set for group membership is an O(1) operation, where it's an O(n) operation for lists.

The number of operations in this test is pretty high, but it can be comparable to certain applications. I have a solution to a combinatorics problem that involves a lot of group membership checking that was very slow until I changed my lists to sets. So this does translate into real-world application performance.

If you are curious about operations on other data types, you can check out this reference: https://wiki.python.org/moin/TimeComplexity

Really easy list comprehension would be:

>>> result = [[x for x in group if x not in courses_taken] for group in list_of_lists]
>>> # output: [['CS300', 'CS206', 'CS306'], ['CS206', 'CS306', 'CS300'], ['CS206', 'CS300', 'CS306']]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM