简体   繁体   中英

Performance difference between for and while loop in Python

Python's documentation suggests that the for statement is actually syntactic sugar that hides away the complexity of the concept of iterators and iterables. If this is true, that means that the following two functions are identical:

def for_loop(seq):
    for i in seq:
        i

and

def while_loop(seq):
    iseq = iter(seq)
    _loop = True
    while _loop:
        try:
            i = next(iseq)
        except StopIteration:
            _loop = False
        else:
            i

Notice that I'm keeping the body of the loop as simple as possible in order to focus on the performance of the for statement, therefore I'm avoiding calling print (or similar functions).

Here are the results after measuring the performance of these functions in IPython:

In [43]: %timeit for_loop(range(1000))                                                                                                                                
22.9 µs ± 356 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [44]: %timeit while_loop(range(1000))                                                                                                                              
49.9 µs ± 825 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [45]: %timeit for_loop(range(100000))                                                                                                                              
2.63 ms ± 43.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [46]: %timeit while_loop(range(100000))                                                                                                                            
5.16 ms ± 69.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

The for statement is actually twice as fast as the while loop (a somewhat smaller difference of 1.6 is observed when passing in a long list instead of a long iterator). The performance difference is constant for range of values of len(seq) . I also observe that there are differences in bytecode of these functions when I disassembled them using the dis module.

To conclude: Python's documentation states that when using the for statement Python actually runs it as cover for while_loop . Can someone Pythoneer address the question of performance difference and particularity what is the source of it (CPython optimization, ...)?

A couple of notes:

  • The fact that for is, from a functional perspective just syntactic sugaring for while , just mean that it has the same implementation under the hood. Different implementations may have very different performance.
  • If you look at the cpython implementation, you'll notice that indeed, for and while have different implementations. See for example the functions compiler_while and compiler_for in https://github.com/python/cpython/blob/ee40e4b8563e6e1bc2bfb267da5ffc9a2293318d/Python/compile.c
  • I suspect that for has some sort of optimization for ranges. When instead of timing your two functions with range(1000000) I timed them with np.random.rand(1000000) the gap went down from 3x on my laptop (24 ms vs 73 ms) to 50% (101 ms vs. 155 ms).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM