简体   繁体   中英

Python list comprehension vs generator

I found this question Generators vs List Comprehension performance in Python and instead of cProfile I use timeit.

from timeit import timeit
import cProfile

print timeit('sum([i for i in range(9999999)])', number=1)
print timeit('sum((i for i in range(9999999)))', number=1)

print cProfile.run('sum([i for i in xrange(9999999)])')
print cProfile.run('sum((i for i in xrange(9999999)))')

Result is

LC timeit 0.728941202164
G timeit 0.643975019455
LC cProfile          3 function calls in 0.751 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.673    0.673    0.751    0.751 <string>:1(<module>)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
        1    0.078    0.078    0.078    0.078 {sum}


None
G cProfile          10000003 function calls in 1.644 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
 10000000    0.843    0.000    0.843    0.000 <string>:1(<genexpr>)
        1    0.000    0.000    1.644    1.644 <string>:1(<module>)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
        1    0.801    0.801    1.644    1.644 {sum}

I believe generator should better than list comprehension but why in this case the result is not clear. My question is which one is better to write

sum((i for i in list_of_i))   # Which use 1 loop

vs

sum([i for i in list_of_i])   # Which seem to took 2 loop: 1 for list create and one for sum

In the simple case, it will be fastest to do this without a comprehension/generator:

sum(xrange(9999999))

Normally, if I need to do some sort of operation where I need to choose between a comprehension and generator expression, I do:

sum(a*b for a, b in zip(c, d))

Personally, I think that the generator expression (without the extra parenthesis 1 ) looks nicer and since readability counts -- This outweighs any micro performance differences between the two expressions.

Generators will frequently be faster for things like this because they avoid creating an intermediate list (and the memory allocation associated with it). The timing difference is probably more pronounced as the list gets bigger as the memory allocation and list resizing take more time for bigger lists. This isn't always the case however (It is well documented on StackOverflow that str.join works faster with lists than with generators in CPython because when str.join gets a generator, it constructs the list anyway...).

1 You can omit the parenthesis any time you are passing a generator expression to a function as the only argument -- Which happens more frequently than you might expect...

Generators load lazily; you have to make a call to get their next value every time you want it.

sum is an aggregate function, which operations on the entire iterable. You have to have all of the values available for it to do its work.

The reason that the list comprehension works faster is that there's only one explicit call to get the entire list, and one explicit operation to sum them all. However, with the generator, you have to get all of the items for it to to perform its aggregation, and since there's a million of them, that results in a million calls.

This is one of those cases in which being eager is better for performance.

The generator version won. The cProfile profiling simply introduced way more overhead for the genexp than the list comprehension, since it has a lot more points where the profiler butts in.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM