itertools.product比嵌套for循环慢

Question

I am trying using the itertools.product function to make a segment of my code (in an isotopic pattern simulator) easier to read and hopefully faster as well (the documentation states that no intermediate results are created) , I have however tested both versions of the code against each other using the cProfiling library and noticed that the itertools.product was significantly slower than my nested for loops. 我正在尝试使用itertools.product函数使我的代码段（在同位素模式模拟器中）更容易阅读，并希望更快（文档声明没有创建中间结果），但我测试了两个版本的使用cProfiling库相互对照的代码，并注意到itertools.product明显慢于我的嵌套for循环。

Example values used for the testing: 用于测试的示例值：

carbons = [(0.0, 0.004613223957020534), (1.00335, 0.02494768843632857), (2.0067, 0.0673219412049374), (3.0100499999999997, 0.12087054681917497), (4.0134, 0.16243239687902825), (5.01675, 0.17427700732161705), (6.020099999999999, 0.15550695260604208), (7.0234499999999995, 0.11869556397525197), (8.0268, 0.07911287899598853), (9.030149999999999, 0.04677626606764402)]
hydrogens = [(0.0, 0.9417611429667746), (1.00628, 0.05651245007201512)]
nitrogens = [(0.0, 0.16148864310897554), (0.99703, 0.2949830688288726), (1.99406, 0.26887643366755537), (2.99109, 0.16305943261399866), (3.98812, 0.0740163089529218), (4.98515, 0.026824040474519875), (5.98218, 0.008084687617425748)]
oxygens17 = [(0.0, 0.8269292736927519), (1.00422, 0.15717628899143962), (2.00844, 0.014907548827832968)]
oxygens18 = [(0.0, 0.3584191873916266), (2.00425, 0.36813434247849824), (4.0085, 0.18867830334103902), (6.01275, 0.06433912182670033), (8.017, 0.016421642936302827)]
sulfurs33 = [(0.0, 0.02204843659673093), (0.99939, 0.08442569434459646), (1.99878, 0.16131398792444965), (2.99817, 0.2050722764666321), (3.99756, 0.1951327596407101), (4.99695, 0.14824112268069747), (5.99634, 0.09365899226198841), (6.99573, 0.050618028523695714), (7.99512, 0.023888506307006133), (8.99451, 0.010000884811585533)]
sulfurs34 = [(0.0, 3.0106350597190195e-10), (1.9958, 6.747270089956428e-09), (3.9916, 7.54568412614702e-08), (5.9874, 5.614443102700176e-07), (7.9832, 3.1268212758750728e-06), (9.979, 1.3903197959791067e-05), (11.9748, 5.141248916434075e-05), (13.970600000000001, 0.0001626288218672788), (15.9664, 0.00044921518047309414), (17.9622, 0.0011007203440032396)]
sulfurs36 = [(0.0, 0.904828368500412), (3.99501, 0.0905009370374487)]

Snippet demonstrating nested for loops: 演示嵌套for循环的代码段：

totals = []
for i in carbons:
    for j in hydrogens:
        for k in nitrogens:
            for l in oxygens17:
                for m in oxygens18:
                    for n in sulfurs33:
                        for o in sulfurs34:
                            for p in sulfurs36:
                                totals.append((i[0]+j[0]+k[0]+l[0]+m[0]+n[0]+o[0]+p[0], i[1]*j[1]*k[1]*l[1]*m[1]*n[1]*o[1]*p[1]))

Snippet demonstrating the use of itertools.product : 片段演示使用itertools.product ：

totals = []
for i in itertools.product(carbons,hydrogens,nitrogens,oxygens17,oxygens18,sulfurs33,sulfurs34,sulfurs36):
    massDiff = i[0][0]
    chance = i[0][1]
    for j in i[1:]:
        massDiff += j[0]
        chance = chance * j[1]
    totals.append((massDiff,chance))

The results from profiling (based on 10 runs per method) was an average of ~0.8 seconds for the nested for loop approach and ~1.3 seconds for the itertools.product approach. 分析的结果（基于每个方法10次运行）对于嵌套for循环方法平均为~0.8秒，对于itertools.product方法为~1.3秒。 My question is thus, am I using the itertools.product function wrongly or should I just stick to the nested for loops? 我的问题是，我是否错误地使用了itertools.product函数，还是应该坚持使用嵌套的for循环？

-- UPDATE -- - 更新 -

I have included two of my cProfile results: 我已经包含了两个cProfile结果：

# ITERTOOLS.PRODUCT APPROACH 
420003 function calls in 1.306 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.018    0.018    1.306    1.306 <string>:1(<module>)
        1    1.246    1.246    1.289    1.289 IsotopeBas.py:64(option1)
   420000    0.042    0.000    0.042    0.000 {method 'append' of 'list' objects}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

and: 和：

# NESTED FOR LOOP APPROACH
420003 function calls in 0.830 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.019    0.019    0.830    0.830 <string>:1(<module>)
        1    0.769    0.769    0.811    0.811 IsotopeBas.py:78(option2)
   420000    0.042    0.000    0.042    0.000 {method 'append' of 'list' objects}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

Answer 1

Your original itertool code spent a lot extra time in the needless lambda , and building lists of intermediate values by hand - a lot of this can be replaced with builtin functionality. 你原来的itertool代码在不必要的lambda花费了很多额外的时间，并且手工构建了中间值列表 - 其中很多都可以用内置函数代替。

Now, the inner for loop does add quite a lot extra overhead: just try the following and the performance is very much on par with your original code: 现在，内部for循环确实增加了很多额外的开销：只需尝试以下内容，性能与原始代码非常相似：

for a in itertools.product(carbons,hydrogens,nitrogens,oxygens17,
                           oxygens18,sulfurs33,sulfurs34,sulfurs36):
    i, j, k, l, m, n, o, p = a
    totals.append((i[0]+j[0]+k[0]+l[0]+m[0]+n[0]+o[0]+p[0],
                   i[1]*j[1]*k[1]*l[1]*m[1]*n[1]*o[1]*p[1]))

The following code runs as much as possible in the CPython builtin side, and I tested it to be equivalent to with code. 以下代码尽可能在CPython内置端运行，我测试它与代码等效。 Notably the code uses zip(*iterable) to unzip each of the product results; 值得注意的是，代码使用zip(*iterable)来解压缩每个产品结果; then uses the reduce with operator.mul for product, and sum for summing; 然后对产品使用reduce with operator.mul ， sum求和; 2 generators for going through the lists. 2个生成器用于浏览列表。 The for loop still beats slightly, but being hardcoded it probably is not what you can use in the long run. for循环仍然略微跳动，但是硬编码可能不是你可以长期使用的东西。

import itertools
from operator import mul
from functools import partial

prod = partial(reduce, mul)
elems = carbons, hydrogens, nitrogens, oxygens17, oxygens18, sulfurs33, sulfurs34, sulfurs36
p = itertools.product(*elems)

totals = [
    ( sum(massdiffs), prod(chances) )
    for massdiffs, chances in
    ( zip(*i) for i in p )
]

Answer 2

My strong suspicion is that the slowness comes from the creation of temporary variables/in places adds/creation of a function every time via lambda as well as the overhead of the function call. 我强烈怀疑，缓慢来自于临时变量的创建/每次通过lambda添加/创建函数以及函数调用的开销。 Just to demonstrate why the way you are doing addition is slower in case 2 I did this: 只是为了说明为什么你做加法的方式比较慢2，我做了这个：

import dis
s = '''
    a = (1, 2)
    b = (2, 3)
    c = (3, 4)

    z = (a[0] + b[0] + c[0])

    t = 0
    t += a[0]
    t += b[0]
    t += c[0]
    '''

x = compile(s, '', 'exec')

dis.dis(x)

This gives: 这给出了：

<snip out variable declaration>
5          18 LOAD_NAME                0 (a)
           21 LOAD_CONST               4 (0)
           24 BINARY_SUBSCR
           25 LOAD_NAME                1 (b)
           28 LOAD_CONST               4 (0)
           31 BINARY_SUBSCR
           32 BINARY_ADD
           33 LOAD_NAME                2 (c)
           36 LOAD_CONST               4 (0)
           39 BINARY_SUBSCR
           40 BINARY_ADD
           41 STORE_NAME               3 (z)

7          50 LOAD_NAME                4 (t)
           53 LOAD_NAME                0 (a)
           56 LOAD_CONST               4 (0)
           59 BINARY_SUBSCR
           60 INPLACE_ADD
           61 STORE_NAME               4 (t)

8          64 LOAD_NAME                4 (t)
           67 LOAD_NAME                1 (b)
           70 LOAD_CONST               4 (0)
           73 BINARY_SUBSCR
           74 INPLACE_ADD
           75 STORE_NAME               4 (t)

9          78 LOAD_NAME                4 (t)
           81 LOAD_NAME                2 (c)
           84 LOAD_CONST               4 (0)
           87 BINARY_SUBSCR
           88 INPLACE_ADD
           89 STORE_NAME               4 (t)
           92 LOAD_CONST               5 (None)
           95 RETURN_VALUE

As you can see there is an additional 2 opcode overhead because of the += addition vs the inline addition. 正如您所看到的，由于+=加法与内联加法相比，还有2个操作码开销。 This overhead comes from needing to load and store the name. 这种开销来自需要加载和存储名称。 I imagine this is just the beginning and Antti Haapala has code that spends more time in cpython builtins calling c code than running just in python. 我想这只是一个开始，而Antti Haapala的代码花费了更多的时间在cpython内置调用c代码而不是在python中运行。 Function call overhead is expensive in python. 函数调用开销在python中很昂贵。

Answer 3

I timed these two functions, which use the absolute minimum of extra code: 我计时这两个函数，它们使用绝对最少的额外代码：

def nested_for(first_iter, second_iter):
    for i in first_iter:
        for j in second_iter:
            pass

def using_product(first_iter, second_iter):
    for i in product(first_iter, second_iter):
        pass

Their bytecode instructions are similar: 他们的字节码指令类似：

dis(nested_for)
  2           0 SETUP_LOOP              26 (to 28)
              2 LOAD_FAST                0 (first_iter)
              4 GET_ITER
        >>    6 FOR_ITER                18 (to 26)
              8 STORE_FAST               2 (i)

  3          10 SETUP_LOOP              12 (to 24)
             12 LOAD_FAST                1 (second_iter)
             14 GET_ITER
        >>   16 FOR_ITER                 4 (to 22)
             18 STORE_FAST               3 (j)

  4          20 JUMP_ABSOLUTE           16
        >>   22 POP_BLOCK
        >>   24 JUMP_ABSOLUTE            6
        >>   26 POP_BLOCK
        >>   28 LOAD_CONST               0 (None)
             30 RETURN_VALUE

dis(using_product)
  2           0 SETUP_LOOP              18 (to 20)
              2 LOAD_GLOBAL              0 (product)
              4 LOAD_FAST                0 (first_iter)
              6 LOAD_FAST                1 (second_iter)
              8 CALL_FUNCTION            2
             10 GET_ITER
        >>   12 FOR_ITER                 4 (to 18)
             14 STORE_FAST               2 (i)

  3          16 JUMP_ABSOLUTE           12
        >>   18 POP_BLOCK
        >>   20 LOAD_CONST               0 (None)
             22 RETURN_VALUE

And here are the results: 以下是结果：

>>> timer = partial(timeit, number=1000, globals=globals())
>>> timer("nested_for(range(100), range(100))")
0.1294467518782625
>>> timer("using_product(range(100), range(100))")
0.4335527486212385

The results of additional tests performed via timeit and manual use of perf_counter were consistent with those above. 通过timeit和手动使用perf_counter进行的额外测试的结果与上述结果一致。 Using product is clearly substantially slower than the use of nested for loops. 使用product显然比使用嵌套for循环要慢得多。 However, based on the tests already displayed in previous answers, the discrepancy between the two approaches is inversely proportional to the number of nested loops (and, of course, the size of the tuple containing the Cartesian product). 但是，根据先前答案中已经显示的测试，两种方法之间的差异与嵌套循环的数量成反比（当然，包含笛卡尔积的元组的大小）。

itertools.product比嵌套for循环慢

问题描述

3 个解决方案

解决方案1
6 已采纳 2014-07-03 14:33:06

解决方案2
2 2014-07-03 14:16:29

解决方案3
0 2017-10-05 11:30:57

itertools.product比嵌套​​for循环慢

问题描述

3 个解决方案

解决方案1 6 已采纳 2014-07-03 14:33:06

解决方案2 2 2014-07-03 14:16:29

解决方案3 0 2017-10-05 11:30:57

itertools.product比嵌套for循环慢

解决方案1
6 已采纳 2014-07-03 14:33:06

解决方案2
2 2014-07-03 14:16:29

解决方案3
0 2017-10-05 11:30:57