简体   繁体   English

从 python 的列表列表中过滤掉一个列表

[英]Filter out a list from list of lists in python

I am new to python and I am trying to compare elements of a list with elements of list of lists.我是 python 的新手,我正在尝试将列表的元素与列表的元素进行比较。

I have a list and a list of lists with different combination of same courses (each list is a different topological sort)我有一个列表和一个列表列表,其中包含相同课程的不同组合(每个列表都是不同的拓扑排序)

The list_of_lists in some cases is huge.在某些情况下list_of_lists是巨大的。 What I want to do is to compare the courses_taken list with the list_of_lists and take as a result the non-common elements of each list, for example:我想要做的是将courses_taken列表与list_of_lists进行比较,并获取每个列表的非常见元素,例如:

# the small list:
courses_taken = ['CS350','CS450']

# a list of lists:
list_of_lists =[['CS450', 'CS350', 'CS300', 'CS206', 'CS306'], ['CS450', 'CS350', 'CS206', 'CS306', 'CS300'], ['CS450', 'CS350', 'CS206', 'CS300', 'CS306'],...]

# the result:
result = [['CS300', 'CS206', 'CS306'], ['CS206', 'CS306', 'CS300'], [ 'CS206', 'CS300', 'CS306']]

From the research I did I found only ways to compare the courses_taken with a sublist and not each specific elements in order to return the non-common ones.从我所做的研究中,我发现只有将 course_taken 与子列表而不是每个特定元素进行比较的方法,才能返回不常见的元素。 Also, I found ways to compare 2 lists but the same code would not work for this case.此外,我找到了比较 2 个列表的方法,但相同的代码不适用于这种情况。

You can create a set with the courses_taken for faster in operations - it would matter if courses taken were a long list.您可以使用courses_taken创建一个set以加快操作速度 - 如果所学的课程很长in这将很重要。

Then just iterate over your list of lists and build a new list that checks for set inclusion.然后只需遍历您的列表列表并构建一个新列表来检查集合包含。

>>> ctset = set(courses_taken)
>>> result = [[item for item in li if item not in ctset] for li in list_of_lists]
>>>
>>> # Or if it really matters, it can be a one-liner.
>>> result = [[item for item in li if item not in set(courses_taken)] for li in list_of_lists]

To demonstrate the difference between checking a list for group membership versus a set , we can set up a couple timeit tests.为了演示检查组成员list与检查set之间的区别,我们可以设置几个 timeit 测试。

>>> from random import randint
>>> import timeit
>>> 
>>> li = list(range(5000))
>>> se = set(li)
>>> 
>>> timeit.timeit("randint(0, 5000) in li", globals=globals(), number=10**6)
33.735417196992785
>>> timeit.timeit("randint(0, 5000) in se", globals=globals(), number=10**6)
1.196909729973413
>>> 

In this case, the set operations were over 30x faster.在这种情况下,集合操作的速度提高了 30 倍以上。

This demonstrates a case where the time-complexity of operations on two different data types comes into play.这演示了对两种不同数据类型的操作的时间复杂性发挥作用的情况。 Checking a set for group membership is an O(1) operation, where it's an O(n) operation for lists.检查一个set的组成员身份是一个 O(1) 操作,其中它是一个 O(n) 操作的列表。

The number of operations in this test is pretty high, but it can be comparable to certain applications.此测试中的操作数量相当多,但可以与某些应用程序相媲美。 I have a solution to a combinatorics problem that involves a lot of group membership checking that was very slow until I changed my lists to sets.我有一个组合问题的解决方案,该问题涉及大量的组成员资格检查,在我将列表更改为集合之前非常缓慢。 So this does translate into real-world application performance.所以这确实转化为实际应用程序的性能。

If you are curious about operations on other data types, you can check out this reference: https://wiki.python.org/moin/TimeComplexity如果您对其他数据类型的操作感到好奇,可以查看此参考: https://wiki.python.org/moin/TimeComplexity

Really easy list comprehension would be:真正简单的列表理解将是:

>>> result = [[x for x in group if x not in courses_taken] for group in list_of_lists]
>>> # output: [['CS300', 'CS206', 'CS306'], ['CS206', 'CS306', 'CS300'], ['CS206', 'CS300', 'CS306']]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM