简体   繁体   中英

Fast iterating over first n items of an iterable (not a list) in python

I'm looking for a pythonic way of iterating over first n items of an iterable ( upd : not a list in a common case, as for lists things are trivial), and it's quite important to do this as fast as possible. This is how I do it now:

count = 0
for item in iterable:
 do_something(item)
 count += 1
 if count >= n: break

Doesn't seem neat to me. Another way of doing this is:

for item in itertools.islice(iterable, n):
    do_something(item)

This looks good, the question is it fast enough to use with some generator(s)? For example:

pair_generator = lambda iterable: itertools.izip(*[iter(iterable)]*2)
for item in itertools.islice(pair_generator(iterable), n):
 so_something(item)

Will it run fast enough as compared to the first method? Is there some easier way to do it?

for item in itertools.islice(iterable, n): is the most obvious, easy way to do it. It works for arbitrary iterables and is O(n), like would be any sane solution.

It's conceivable that another solution could have better performance; we wouldn't know without timing. I wouldn't recommend bothering with timing unless you profile your code and find this call to be a hotspot. Unless it's buries within an inner loop, it is highly doubtful that it will be. Premature optimization is the root of all evil.


If I was going to look for alternate solutions, I would look at ones like for count, item in enumerate(iterable): if count > n: break ... and for i in xrange(n): item = next(iterator) ... . I wouldn't guess these would help, but they seem to be worth trying if we really want to compare things. If I was stuck in a situation where I profiled and found this was a hotspot in an inner loop (is this really your situation?), I would also try to ease the name lookup from getting the islice attribute of the global iterools to binding the function to a local name already.

These are things you only do after you've proven they'll help. People try doing them other times a lot. It doens't help make their programs appreciably faster; it just makes their programs worse.

itertools tends to be the fastest solution, when directly applicable.

Obviously, the only way to check is to benchmark -- eg, save in aaa.py

import itertools

def doit1(iterable, n, do_something=lambda x: None):
  count = 0
  for item in iterable:
   do_something(item)
   count += 1
   if count >= n: break

def doit2(iterable, n, do_something=lambda x: None):
  for item in itertools.islice(iterable, n):
      do_something(item)

pair_generator = lambda iterable: itertools.izip(*[iter(iterable)]*2)

def dd1(itrbl=range(44)): doit1(itrbl, 23)
def dd2(itrbl=range(44)): doit2(itrbl, 23)

and see...:

$ python -mtimeit -s'import aaa' 'aaa.dd1()'
100000 loops, best of 3: 8.82 usec per loop
$ python -mtimeit -s'import aaa' 'aaa.dd2()'
100000 loops, best of 3: 6.33 usec per loop

so clearly, itertools is faster here -- benchmark with your own data to verify.

BTW, I find timeit MUCH more usable from the command line, so that's how I always use it -- it then runs the right "order of magnitude" of loops for the kind of speeds you're specifically trying to measure, be those 10, 100, 1000, and so on -- here, to distinguish a microsecond and a half of difference, a hundred thousand loops is about right.

如果它是一个列表,那么你可以使用切片:

list[:n]

You can use enumerate to write essentially the same loop you have, but in a more simple, Pythonic way:

for idx, val in enumerate(iterableobj):
    if idx > n:
        break
    do_something(val)

Of a list? Try

for k in mylist[0:n]:
     # do stuff with k

you can also use a comprehension if you need to

my_new_list = [blah(k) for k in mylist[0:n]]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM