为什么这种切片代码比更多的过程代码要快？

Question

I have a Python function that takes a list and returns a generator yielding 2-tuples of each adjacent pair, eg 我有一个Python函数，该函数接受一个列表并返回一个生成器，该生成器生成每个相邻对的2元组，例如

>>> list(pairs([1, 2, 3, 4]))
[(1, 2), (2, 3), (3, 4)]

I've considered an implementation using 2 slices: 我考虑了使用2个切片的实现：

def pairs(xs):
    for p in zip(xs[:-1], xs[1:]): 
        yield p

and one written in a more procedural style: 还有一种以更程序化的方式编写的：

def pairs(xs):
    last = object()
    dummy = last
    for x in xs:
        if last is not dummy:
            yield last,x
        last = x

Testing using range(2 ** 15) as input yields the following times (you can find my testing code and output here ): 使用range(2 ** 15)作为输入进行测试会产生以下时间（您可以在此处找到我的测试代码并输出）：

2 slices: 100 loops, best of 3: 4.23 msec per loop
0 slices: 100 loops, best of 3: 5.68 msec per loop

Part of the performance hit for the sliceless implementation is the comparison in the loop ( if last is not dummy ). 无切片实现的部分性能损失是循环中的比较（ if last is not dummy ）。 Removing that (making the output incorrect) improves its performance, but it's still slower than the zip-a-pair-of-slices implementation: 删除它（使输出不正确）可以提高其性能，但是它仍然比zip-a-pair-of-slices实施慢：

2 slices: 100 loops, best of 3: 4.48 msec per loop
0 slices: 100 loops, best of 3: 5.2 msec per loop

So, I'm stumped. 所以，我很困惑。 Why is zipping together 2 slices, effectively iterating over the list twice in parallel, faster than iterating once, updating last and x as you go? 为什么将2个片段压缩在一起，从而有效地并行地对列表进行两次迭代，而不是一次迭代， last一次更新并随行更新x呢？

EDIT 编辑

Dan Lenski proposed a third implementation: 丹·伦斯基提出了第三种实施方案：

def pairs(xs):
    for ii in range(1,len(xs)):
        yield xs[ii-1], xs[ii]

Here's its comparison to the other implementations: 这是与其他实现的比较：

2 slices: 100 loops, best of 3: 4.37 msec per loop
0 slices: 100 loops, best of 3: 5.61 msec per loop
Lenski's: 100 loops, best of 3: 6.43 msec per loop

It's even slower! 它甚至更慢！ Which is baffling to me. 这让我感到困惑。

EDIT 2: 编辑2：

ssm suggested using itertools.izip instead of zip , and it's even faster than zip : ssm建议使用itertools.izip而不是zip ，它甚至比zip还要快：

2 slices, izip: 100 loops, best of 3: 3.68 msec per loop

So, izip is the winner so far! 因此， izip是迄今为止的赢家！ But still for difficult-to inspect reasons. 但是仍然出于难以检查的原因。

Answer 1

This i the result for the iZip which is actually closer to your implementation. 这实际上是iZip的结果，它实际上更接近于您的实现。 Looks like what you would expect. 看起来像您期望的那样。 The zip version is creating the entire list in memory within the function so it is the fastest. zip版本正在函数中的内存中创建整个列表，因此它是最快的。 The loop version just los through the list, so it is a little slower. 循环版本仅在列表中消失，因此速度稍慢。 The izip is the closest in resemblance to the code, but I am guessing there is some memory-management backend processes which increase the time of execution. izip与代码最相似，但是我猜想有一些内存管理后端进程会增加执行时间。

In [11]: %timeit pairsLoop([1,2,3,4,5])
1000000 loops, best of 3: 651 ns per loop

In [12]: %timeit pairsZip([1,2,3,4,5])
1000000 loops, best of 3: 637 ns per loop

In [13]: %timeit pairsIzip([1,2,3,4,5])
1000000 loops, best of 3: 655 ns per loop

The version of code is shown below as requested: 代码版本如下所示：

from itertools import izip


def pairsIzip(xs):
    for p in izip(xs[:-1], xs[1:]): 
        yield p

def pairsZip(xs):
    for p in zip(xs[:-1], xs[1:]): 
        yield p

def pairsLoop(xs):
    last = object()
    dummy = last
    for x in xs:
        if last is not dummy:
            yield last,x
        last = x

Answer 2

Lots of interesting discussion elsewhere in this thread. 在此线程的其他地方有很多有趣的讨论。 Basically, we started out comparing two versions of this function, which I'm going to describe with the following dumb names: 基本上，我们开始比较此函数的两个版本，下面将使用以下愚蠢的名称进行描述：

The " zip -py" version: “ zip -py”版本：

 def pairs(xs): for p in zip(xs[:-1], xs[1:]): yield p

The "loopy" version: “ loopy”版本：

 def pairs(xs): last = object() dummy = last for x in xs: if last is not dummy: yield last,x last = x

So why does the loopy version turn out to be slower? 那么，为什么循环版本变慢了呢？ Basically, I think it comes down to a couple things: 基本上，我认为可以归结为两点：

The loopy version explicitly does extra work: it compares two objects' identities ( if last is not dummy: ... ) on every pair-generating iteration of the inner loop. 循环版本显式地做了额外的工作：它在内部循环的每个生成对的迭代中比较两个对象的身份（ if last is not dummy: ... ）。
- @mambocab's edit shows that not doing this comparison does make the loopy version @mambocab的编辑显示，不进行此比较确实会使循环版本
  slightly faster, but doesn't fully close the gap. 稍快一些，但并不能完全消除差距。
The zippy version does more stuff in compiled C code that the loopy version does in Python code: zippy版本在编译的C代码中比循环版本在Python代码中具有更多功能：
- Combining two objects into a tuple . 将两个对象组合成一个tuple 。 The loopy version does yield last,x , while in the zippy version the tuple p comes straight from zip , so it just does yield p . 循环版本的确输出yield last,x ，而在zippy版本中，元组p直接来自zip ，因此它的确yield p 。
- Binding variable names to object: the loopy version does this twice in every loop, assigning x in the for loop and last=x . 将变量名称绑定到对象：循环版本在每个循环中执行两次，在for循环中分配x ，在last=x分配last=x 。 The zippy version does this just once, in the for loop. zippy版本在for循环中仅执行一次。
Interestingly, there is one way in which the zippy version is actually doing more work: it uses two listiterator s, iter(xs[:-1]) and iter(xs[1:]) , which get passed to zip . 有趣的是，有一种方法可以使zippy版本实际完成更多工作：它使用两个 listiterator ， iter(xs[:-1])和iter(xs[1:])传递给zip 。 The loopy version only uses one listiterator ( for x in xs ). 循环版本仅使用一个 listiterator （ for x in xs ）。
- Creating a listiterator object (the output of iter([]) ) is likely a very highly optimized operation since Python programmers use it so frequently. 创建listiterator对象（ iter([])的输出）可能是一个非常优化的操作，因为Python程序员如此频繁地使用它。
- Iterating over list slices, xs[:-1] and xs[1:] , is a very lightweight operation which adds almost no overhead compared to iterating over the whole list. 遍历列表切片xs[:-1]和xs[1:]是一种非常轻量级的操作，与遍历整个列表相比，几乎没有开销。 Essentially, it just means moving the starting or ending point of the iterator, but not changing what happens on each iteration. 从本质上讲，这仅意味着移动迭代器的起点或终点，而不是更改每次迭代时发生的事情。

Answer 3

I suspect the main reason that the second version is slower is because it does a comparison operation for every single pair that it yield s: 我怀疑第二个版本较慢的主要原因是因为它对yield s的每个单对都执行比较操作：

# pair-generating loop
for x in xs:
    if last is not dummy:
       yield last,x
    last = x

The first version does not do anything but spit out values. 第一个版本除了吐出值外什么也不做。 With the loop variables renamed, it's equivalent to this: 重命名循环变量后，其等效于此：

# pair-generating loop
for last,x in zip(xs[:-1], xs[1:]):
    yield last,x

It's not especially pretty or Pythonic, but you could write a procedural version without a comparison in the inner loop. 它不是特别漂亮或Pythonic，但是您可以编写过程版本而无需在内部循环中进行比较。 How fast does this one run? 这个运行多快？

def pairs(xs):
    for ii in range(1,len(xs)):
        yield xs[ii-1], xs[ii]

为什么这种切片代码比更多的过程代码要快？

问题描述

3 个解决方案

解决方案1
2 2014-09-17 01:09:24

解决方案2
2 已采纳 2014-09-17 02:00:52

解决方案3
1 2014-09-17 00:46:55

为什么这种切片代码比更多的过程代码要快？

问题描述

3 个解决方案

解决方案1 2 2014-09-17 01:09:24

解决方案2 2 已采纳 2014-09-17 02:00:52

解决方案3 1 2014-09-17 00:46:55

解决方案1
2 2014-09-17 01:09:24

解决方案2
2 已采纳 2014-09-17 02:00:52

解决方案3
1 2014-09-17 00:46:55