简体   繁体   English

为什么将列表转换为集合比仅使用列表计算列表差异更快?

[英]Why is converting a list to a set faster than using just list to compute a list difference?

Say, I wish to compute the difference of two lists C = A - B : 说,我想计算两个列表C = A - B的差异:

A = [1,2,3,4,5,6,7,8,9] 
B = [1,3,5,8,9]
C = [2,4,6,7]          #Result

A and B are both sorted with unique integers (not sure if there is a way to tell Python about this property of the list) . AB都用唯一的整数排序(不确定是否有办法告诉Python有关列表的这个属性) I need to preserve the order of the elements. 我需要保留元素的顺序。 AFAIK there are two possible ways of doing it AFAIK有两种可行的方法

Method 1 : Convert B into a set and use list comprehension to generate C: 方法1将B转换为集合并使用列表解析来生成C:

s = set(B)
C = [x for x in A if x not in s]

Method 2 : Directly use list comprehension: 方法2直接使用列表理解:

C = [x for x in A if x not in B]

Why is #1 more efficient than #2 ? 为什么#1#2更有效? Isn't there an overhead to convert to a set? 是否有转换为集合的开销? What am I missing here? 我在这里错过了什么?

Some performance benchmarks are given in this answer. 本答案给出了一些性能基准

UPDATE: I'm aware that a set's average O(1) lookup time beats that of a list's O(n) but if the original list A contains about a million or so integers, wouldn't the set creation actually take longer? 更新:我知道集合的平均O(1)查找时间比列表的O(n)要快,但如果原始列表A包含大约一百万左右的整数,那么集合创建实际上不会花费更长时间吗?

There is overhead to convert a list to a set, but a set is substantially faster than a list for those in tests. 有开销列表转换为一组,而是一套是比那些列表明显更快in测试。

You can instantly see if item x is in set y because there's a hash table being used underneath. 您可以立即查看项目x是否在集合y因为下面使用了哈希表。 No matter how large your set is, the lookup time is the same (basically instantaneous) - this is known in Big-O notation as O(1). 无论你的集合有多大,查找时间都是相同的(基本上是瞬时的) - 这在Big-O表示法中称为O(1)。 For a list, you have to individually check every element to see if item x is in list z . 对于列表,您必须单独检查每个元素以查看项目x是否在列表z As your list grows, the check will take longer - this is O(n), meaning the length of the operation is directly tied to how long the list is. 随着列表的增长,检查将花费更长的时间 - 这是O(n),这意味着操作的长度与列表的长度直接相关。

That increased speed can offset the set creation overhead, which is how your set check ends up being faster. 增加的速度可以抵消设置的创建开销,这就是您的设置检查最终更快的方式。

EDIT: to answer that other question, Python has no way of determining that your list is sorted - not if you're using a standard list object, anyway. 编辑:要回答其他问题,Python无法确定您的列表是否已排序 - 不管您是否正在使用标准list对象。 So it can't achieve O(log n) performance with a list comprehension. 因此,使用列表理解无法实现O(log n)性能。 If you wanted to write your own binary search method which assumes the list is sorted, you can certainly do so, but O(1) beats O(log n) any day. 如果你想编写自己的二进制搜索方法,假设列表已经排序,你当然可以这样做,但O(1)任何一天都会击败O(log n)。

Average time complexity for lookup (x in S) in a set is O(1) while the same for a list is O(n). 集合中查找的平均时间复杂度(S中的x)是O(1),而列表的相同是O(n)。

You can check the details at https://wiki.python.org/moin/TimeComplexity 您可以访问https://wiki.python.org/moin/TimeComplexity查看详细信息

According to the Python documentation on time complexity 根据关于时间复杂度Python文档

  • List membership x in s is on average linear-time operation, or O(n) . 列表成员资格x in s是平均线性时间操作,或O(n)
  • Set membership x in s is on average constant-time operation, or O(1) . x in s设置成员资格x in s是平均恒定时间操作,或O(1)

Building a set is worst-case linear-time operation, because one would need to scan all the elements in a list to build a hash-table, so O(n) . 构建集合是最坏情况的线性时间操作,因为需要扫描列表中的所有元素以构建散列表,因此O(n) n is number of elements in a collection. n是集合中的元素数。

The key observation is that, in Method 1 , building a set, s = set(B) is just a one-off operation, then after that we just have n total number of set-membership test as in x not in B , so in total O(n) + n * O(1) , or O(n) time complexity. 关键的观察是,在方法1中 ,构建一个集合, s = set(B)只是一次性操作,之后我们只有n个集合成员资格测试总数,如x not in B ,所以总O(n) + n * O(1)O(n)时间复杂度。

Whereas in Method 2 , the list-membership test x not in B is carried out for each element in A , so in total n * O(n) = O(n^2) time complexity. 而在方法2中 ,对于A每个元素执行不在x not in B的列表成员资格测试x not in B ,因此总共n * O(n) = O(n^2)时间复杂度。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM