简体   繁体   English

如何有效检查列表是否在列表的另一个列表中

[英]How to efficiently check if a list is in another list of lists python

I have two lists (listA, listB), each composed of many list of tuples. 我有两个列表(listA,listB),每个列表由许多元组列表组成。

Eg 例如

listA = [ [(0,1), (1,2) ... ] , [(5,6), (6,10)] , ... ] # can have 5000 lists, each with 100+ tuples
listB = [...] # about the same structure

I want to loop over each list in listA, if it is not in listB, I append it to listB. 我想遍历listA中的每个列表,如果不在listB中,则将其附加到listB。

So it is something like this: 所以是这样的:

for lst in listA:
    if lst not in listB: # membership checking
        listB.append(lst)

I have hundreds of thousands of such tasks to perform and it seems to be really slow when listA and listB get bigger. 我有成千上万的此类任务要执行,当listA和listB变大时,这似乎真的很慢。 The membership checking seems to be the bottleneck here. 成员资格检查似乎是这里的瓶颈。 I've tried using a string '0-1' instead of a tuple of ints, but it is not getting any faster. 我尝试使用字符串“ 0-1”而不是整数元组,但是它并没有得到更快的速度。 Does anyone know how to optimize the code? 有谁知道如何优化代码? Is list membership checking really slow? 列表成员资格检查真的很慢吗?

Any help is greatly appreciated. 任何帮助是极大的赞赏。 Thanks! 谢谢!

------------- EDIT: this is what I end up using ------------- -------------编辑:这就是我最终使用的-------------

Thank you, guys. 感谢大伙们。 Converting nested lists to tuples and using set works! 将嵌套列表转换为元组并使用set方法! But have to be careful when looping over listA, each nested list also has to be converted to a tuple (but just for membership checking!). 但是在遍历listA时必须小心,每个嵌套列表也必须转换为元组(但仅用于成员资格检查!)。 I still need to append the nested list as a list to listB. 我仍然需要将嵌套列表作为列表附加到listB。 That is: 那是:

# first convert listB to a set of tuples
listB_as_set = set([tuple(x) for x in listB]) # O(N)

for lst in listA:
    # convert the nested list to tuple
    lst_tuple = tuple(lst)
    # membership checking
    if lst_tuple in listB_as_set: # now O(1), originally O(N)
        listB.append(lst) # still appending as a list to listB

Assuming both lists have length N, and ignoring the time for converting lst to lst_tuple, and append lst to listB, we got an improvement from O(N2) to O(N) , if I'm not mistaken. 假设两个列表的长度均为N,并且忽略了将lst转换为lst_tuple的时间,并将lst附加到listB的时间,如果我没记错的话,我们得到了从O(N2)O(N)的改进。

If you would like store values in order to check for their existence, sets are significantly faster. 如果您想存储值以检查它们的存在,则设置会明显更快。 So you can try this,and then use the for loop,it will be faster than list. 因此,您可以尝试此操作,然后使用for循环,它将比list快。

listA,listB = set(listA),set(listB)

That is becaus set uses a hash function to map to a bucket. 这是因为set使用哈希函数映射到存储桶。 Since Python implementations automatically resize that hash table, the speed can be constant O(1) . 由于Python实现会自动调整该哈希表的大小,因此速度可以为常数O(1)

Sets are significantly faster when it comes to determining if an object i in a set , bu slower than lists when it comes to iterating over their contents. Sets有显著快,当涉及到确定对象我在一组,卜慢于lists ,当谈到遍历其内容。


If you're using nested list,you can try 如果您使用的是嵌套列表,则可以尝试

listA = [[(0, 1), (1, 2)], [(5, 6), (6, 10)]]
listA = { tuple(i) for i in listA}

Or 要么

listA = {frozenset(i) for i in listA} 

frozenset type is immutable and hashable,so Frozenset类型是不可变且可哈希的 ,因此

frozenset([(0, 1), (1, 2)]) = frozenset([(1,2),(0,1)])

Hope this helps. 希望这可以帮助。

The way you are doing it now, it's a O(N^2) operation because of the nature of lists. 由于列表的性质,您现在的操作方式是O(N ^ 2)操作。 But if you use sets, it because approximate O(n+m) see here for details: https://wiki.python.org/moin/TimeComplexity 但是如果使用集合,则因为近似O(n + m),请参见此处以了解详细信息: https : //wiki.python.org/moin/TimeComplexity

So the approach is 所以方法是

a = set(lista)
b = set(listb)

b.union(lista)

Just three lines of code and much faster too. 只需三行代码,速度也快得多。 A good poin raised by AChampion about uhashable lists. AChampion提出了一个关于可散列列表的好方法。 In that case 在这种情况下

a = set([ tuple(x) for x in listA ])

would work. 会工作。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM