快速迭代python中可迭代（不是列表）的前n项

Question

I'm looking for a pythonic way of iterating over first n items of an iterable ( upd : not a list in a common case, as for lists things are trivial), and it's quite important to do this as fast as possible. 我正在寻找一种pythonic方法来迭代迭代的前n项（ upd ：在常见情况下不是列表，对于列表事情是微不足道的），并且尽可能快地执行此操作非常重要。 This is how I do it now: 这是我现在这样做的方式：

count = 0
for item in iterable:
 do_something(item)
 count += 1
 if count >= n: break

Doesn't seem neat to me. 对我来说似乎并不整洁。 Another way of doing this is: 另一种方法是：

for item in itertools.islice(iterable, n):
    do_something(item)

This looks good, the question is it fast enough to use with some generator(s)? 这看起来不错，问题是它是否足够快与一些发电机一起使用？ For example: 例如：

pair_generator = lambda iterable: itertools.izip(*[iter(iterable)]*2)
for item in itertools.islice(pair_generator(iterable), n):
 so_something(item)

Will it run fast enough as compared to the first method? 与第一种方法相比，它运行得足够快吗？ Is there some easier way to do it? 有没有更简单的方法呢？

Answer 1

for item in itertools.islice(iterable, n): is the most obvious, easy way to do it. for item in itertools.islice(iterable, n):是最明显，最简单的方法。 It works for arbitrary iterables and is O(n), like would be any sane solution. 它适用于任意迭代，并且是O（n），就像任何理智的解决方案一样。

It's conceivable that another solution could have better performance; 可以想象，另一种解决方案可以有更好的性能; we wouldn't know without timing. 没有时间我们就不会知道。 I wouldn't recommend bothering with timing unless you profile your code and find this call to be a hotspot. 我不建议打扰时间，除非你描述你的代码并发现这个电话是一个热点。 Unless it's buries within an inner loop, it is highly doubtful that it will be. 除非它在内环中被掩埋，否则它将是非常值得怀疑的。 Premature optimization is the root of all evil. 过早优化是万恶之源。

If I was going to look for alternate solutions, I would look at ones like for count, item in enumerate(iterable): if count > n: break ... and for i in xrange(n): item = next(iterator) ... . 如果我要去寻找替代解决方案，我想看看那些象for count, item in enumerate(iterable): if count > n: break ...而for i in xrange(n): item = next(iterator) ... I wouldn't guess these would help, but they seem to be worth trying if we really want to compare things. 我不认为这会有所帮助，但如果我们真的想比较一下，它们似乎值得尝试。 If I was stuck in a situation where I profiled and found this was a hotspot in an inner loop (is this really your situation?), I would also try to ease the name lookup from getting the islice attribute of the global iterools to binding the function to a local name already. 如果我被困在我描述的情况下，发现这是一个内循环中的热点 （这真的是你的情况吗？），我还会尝试简化名称查找，使全局iterools的islice属性绑定到已经成为本地名称的功能。

These are things you only do after you've proven they'll help. 这些是你在证明他们会帮助之后才做的事情。 People try doing them other times a lot. 人们会尝试其他时间做很多事情。 It doens't help make their programs appreciably faster; 它并没有帮助使他们的程序明显更快; it just makes their programs worse. 它只会使他们的程序变得更糟。

Answer 2

itertools tends to be the fastest solution, when directly applicable. 当直接适用时， itertools往往是最快的解决方案。

Obviously, the only way to check is to benchmark -- eg, save in aaa.py 显然，检查的唯一方法是进行基准测试 - 例如，保存在aaa.py

import itertools

def doit1(iterable, n, do_something=lambda x: None):
  count = 0
  for item in iterable:
   do_something(item)
   count += 1
   if count >= n: break

def doit2(iterable, n, do_something=lambda x: None):
  for item in itertools.islice(iterable, n):
      do_something(item)

pair_generator = lambda iterable: itertools.izip(*[iter(iterable)]*2)

def dd1(itrbl=range(44)): doit1(itrbl, 23)
def dd2(itrbl=range(44)): doit2(itrbl, 23)

and see...: 并看到......：

$ python -mtimeit -s'import aaa' 'aaa.dd1()'
100000 loops, best of 3: 8.82 usec per loop
$ python -mtimeit -s'import aaa' 'aaa.dd2()'
100000 loops, best of 3: 6.33 usec per loop

so clearly, itertools is faster here -- benchmark with your own data to verify. 很明显，itertools在这里更快 - 用你自己的数据进行基准测试来验证。

BTW, I find timeit MUCH more usable from the command line, so that's how I always use it -- it then runs the right "order of magnitude" of loops for the kind of speeds you're specifically trying to measure, be those 10, 100, 1000, and so on -- here, to distinguish a microsecond and a half of difference, a hundred thousand loops is about right. 顺便说一句，我发现timeit在命令行中更有用，所以这就是我总是使用它的方式 - 它然后针对你特别想要测量的那种速度运行正确的“数量级”循环，那些10 ，100,1000等等 - 在这里，为了区分微秒和一半的差异，十万个循环是正确的。

Answer 3

如果它是一个列表，那么你可以使用切片：

list[:n]

Answer 4

You can use enumerate to write essentially the same loop you have, but in a more simple, Pythonic way: 您可以使用枚举来编写与您相同的循环，但使用更简单的Pythonic方式：

for idx, val in enumerate(iterableobj):
    if idx > n:
        break
    do_something(val)

Answer 5

Of a list? 一个清单？ Try 尝试

for k in mylist[0:n]:
     # do stuff with k

you can also use a comprehension if you need to 如果需要，你也可以使用理解

my_new_list = [blah(k) for k in mylist[0:n]]

快速迭代python中可迭代（不是列表）的前n项

问题描述

5 个解决方案

解决方案1
15 已采纳 2010-04-23 22:00:43

解决方案2
6 2010-04-23 22:03:08

解决方案3
2 2010-04-23 21:50:00

解决方案4
2 2010-04-23 22:12:38

解决方案5
1 2010-04-23 21:49:52

快速迭代python中可迭代（不是列表）的前n项

问题描述

5 个解决方案

解决方案1 15 已采纳 2010-04-23 22:00:43

解决方案2 6 2010-04-23 22:03:08

解决方案3 2 2010-04-23 21:50:00

解决方案4 2 2010-04-23 22:12:38

解决方案5 1 2010-04-23 21:49:52

解决方案1
15 已采纳 2010-04-23 22:00:43

解决方案2
6 2010-04-23 22:03:08

解决方案3
2 2010-04-23 21:50:00

解决方案4
2 2010-04-23 22:12:38

解决方案5
1 2010-04-23 21:49:52