“针对”迭代与直接索引性能

Question

Just got a strange results that I am trying to understand. 刚得到一个我想理解的奇怪结果。 I have a dataset about 325k rows (lists) with about 90 items each (strings, floats etc - it doesn't really matter). 我有一个约325k行（列表）的数据集，每个约有90个项目（字符串，浮点数-没关系）。 Say, if I want to do some processing for all item then I can iterate over them using 2 "for"s: 说，如果我想对所有项目进行一些处理，则可以使用2个“ for”遍历它们：

for eachRow in rows:
    for eachItem in eachRow:
        # do something

In my system this code executed for 41 sec. 在我的系统中，此代码执行了41秒。 But if I replace nested loop with series of index acess ( eachRow[0], eachRowm[1] and so far up to eachRow[89] ), the execution time drops to 25 sec. 但是，如果我用一系列索引访问项（eachRow [0]，eachRowm [1]以及到目前为止的eachRow [89]）代替嵌套循环，则执行时间将降至25秒。

for eachRow in rows:
    eachRow[0]  # do something with this item
    eachRow[1]  # do something with this item
    ..
    eachRow[89] # do something with this item

Of course, writing code like that is not a good idea - I was just looking for a way to impove data processing performance and accidentally found this strange approach. 当然，编写这样的代码不是一个好主意-我只是在寻找一种提高数据处理性能的方法，偶然发现了这种奇怪的方法。 Any comments? 任何意见？

Answer 1

There does seem to be a slight performance advantage to doing the unrolling, but it's negligible, and so unless your do_something function really does almost nothing, you shouldn't see the difference. 展开似乎确实在性能上有一点优势，但是可以忽略不计，因此，除非您的do_something函数实际上几乎什么都不做，否则您应该看不出区别。 I have a tough time believing equivalent behaviour with the different approach could amount to a factor of 60%, although I'm always willing to be surprised by some implementation detail I'd never thought about. 我很难过，我相信使用其他方法的等效行为可能占60％，尽管我总是愿意为从未想到的一些实现细节感到惊讶。

tl;dr summary, using 32500 instead of 325000 because I'm impatient: tl;博士总结，因为我不耐烦，所以使用32500而不是325000：

do_nothing easy 3.44702410698
do_nothing indexed 3.99766016006
do_nothing mapped 4.36127090454
do_nothing unrolled 3.33416581154
do_something easy 5.4152610302
do_something indexed 5.95649385452
do_something mapped 6.20316290855
do_something unrolled 5.2877831459
do_more easy 16.6573209763
do_more indexed 16.8381450176
do_more mapped 17.6184959412
do_more unrolled 16.0713188648

CPython 2.7.3, code: CPython 2.7.3，代码：

from timeit import Timer

nrows = 32500
ncols = 90
a = [[1.0*i for i in range(ncols)] for j in range(nrows)]

def do_nothing(x):
    pass

def do_something(x):
    z = x+3
    return z

def do_more(x):
    z = x**3+x**0.5+4
    return z

def easy(rows, action):
    for eachRow in rows:
        for eachItem in eachRow:
            action(eachItem)

def mapped(rows, action):
    for eachRow in rows:
        map(action, eachRow)

def indexed(rows, action):
    for eachRow in rows:
        for i in xrange(len(eachRow)):
            action(eachRow[i])

def unrolled(rows, action):
    for eachRow in rows:
        action(eachRow[0])
        action(eachRow[1])
        action(eachRow[2])
        action(eachRow[3])
        action(eachRow[4])
        action(eachRow[5])
        action(eachRow[6])
        action(eachRow[7])
        action(eachRow[8])
        action(eachRow[9])
        action(eachRow[10])
        action(eachRow[11])
        action(eachRow[12])
        action(eachRow[13])
        action(eachRow[14])
        action(eachRow[15])
        action(eachRow[16])
        action(eachRow[17])
        action(eachRow[18])
        action(eachRow[19])
        action(eachRow[20])
        action(eachRow[21])
        action(eachRow[22])
        action(eachRow[23])
        action(eachRow[24])
        action(eachRow[25])
        action(eachRow[26])
        action(eachRow[27])
        action(eachRow[28])
        action(eachRow[29])
        action(eachRow[30])
        action(eachRow[31])
        action(eachRow[32])
        action(eachRow[33])
        action(eachRow[34])
        action(eachRow[35])
        action(eachRow[36])
        action(eachRow[37])
        action(eachRow[38])
        action(eachRow[39])
        action(eachRow[40])
        action(eachRow[41])
        action(eachRow[42])
        action(eachRow[43])
        action(eachRow[44])
        action(eachRow[45])
        action(eachRow[46])
        action(eachRow[47])
        action(eachRow[48])
        action(eachRow[49])
        action(eachRow[50])
        action(eachRow[51])
        action(eachRow[52])
        action(eachRow[53])
        action(eachRow[54])
        action(eachRow[55])
        action(eachRow[56])
        action(eachRow[57])
        action(eachRow[58])
        action(eachRow[59])
        action(eachRow[60])
        action(eachRow[61])
        action(eachRow[62])
        action(eachRow[63])
        action(eachRow[64])
        action(eachRow[65])
        action(eachRow[66])
        action(eachRow[67])
        action(eachRow[68])
        action(eachRow[69])
        action(eachRow[70])
        action(eachRow[71])
        action(eachRow[72])
        action(eachRow[73])
        action(eachRow[74])
        action(eachRow[75])
        action(eachRow[76])
        action(eachRow[77])
        action(eachRow[78])
        action(eachRow[79])
        action(eachRow[80])
        action(eachRow[81])
        action(eachRow[82])
        action(eachRow[83])
        action(eachRow[84])
        action(eachRow[85])
        action(eachRow[86])
        action(eachRow[87])
        action(eachRow[88])
        action(eachRow[89])


def timestuff():
    for action in 'do_nothing do_something do_more'.split():
        for name in 'easy indexed mapped unrolled'.split():
            t = Timer(setup="""
from __main__ import {} as fn
from __main__ import {} as action
from __main__ import a
""".format(name, action),
                      stmt="fn(a, action)").timeit(10)
            print action, name, t

if __name__ == '__main__':
    timestuff()

(Note that I didn't bother making the comparisons exactly fair, because I was only trying to gauge the likely scale of the variations, ie changes of order unity or not.) （请注意，我并没有打扰使比较完全公平，因为我只是在尝试估算变化的可能范围，即订单是否统一。）

Answer 2

Sorry guys, it was my fault. 抱歉，这是我的错。 It was something wrong with my system (this is not a standalone Python interpreter but a built-in in big system). 我的系统出了点问题（这不是独立的Python解释器，而是大型系统中内置的）。 After restarting the whole system I've got right results - about 2.8 sec for both variants. 重新启动整个系统后，我得到了正确的结果-两种变体大约需要2.8秒。 I feel stupid. 我觉得我好笨。 Looking for a way to delete my question because of irrelevance. 由于不相关，正在寻找删除我的问题的方法。

Answer 3

Unlike the other responder who timed this, I saw a quite a large difference in timings. 与其他为此计时的响应者不同，我发现计时的差异很大。 First, my code: 首先，我的代码：

import random
import string
import timeit

r = 1000
outer1 = [[[''.join([random.choice(string.ascii_letters) for j in range(10)])] for k in range(90)] for l in range(r)]
outer2 = [[[''.join([random.choice(string.ascii_letters) for j in range(10)])] for k in range(90)] for l in range(r)]
outer3 = [[[''.join([random.choice(string.ascii_letters) for j in range(10)])] for k in range(90)] for l in range(r)]

def x1(L):
    for outer in L:
        for inner in L:
            inner = inner[:-1]

def x2(L):
    for outer in L:
        for y in range(len(outer)):
            outer[y] = outer[y][:-1]

def x3(L):
    for x in range(len(L)):
        for y in range(len(L[x])):
            L[x][y] = L[x][y][:-1]

print "x1 =",timeit.Timer('x1(outer1)', "from __main__ import x1,outer1").timeit(10)
print "x2 =",timeit.Timer('x2(outer2)', "from __main__ import x2,outer2").timeit(10)
print "x3 =",timeit.Timer('x3(outer3)', "from __main__ import x3,outer3").timeit(10)

Note I'm running each of these 10 times. 请注意，我正在运行这10次。 Each list is being populated with 3000 items which each contain 90 items which are each a random string of ten letters. 每个列表中填充了3000个项目，每个项目包含90个项目，每个项目都是由10个字母组成的随机字符串。

Representative results: 代表性的结果：

x1 = 8.0179214353
x2 = 0.118051644801
x3 = 0.150409681521

The function that uses no indexing (x1) takes 66 times longer to execute than does the one which uses indexing only for the inner loop (x2). 不使用索引（x1）的函数执行时间比仅对内部循环（x2）使用索引的函数执行时间长66倍 。 Oddly enough, the function which only uses the indexing for the inner loop (x2) performs better than the one which uses indexing for both the outer loop and the inner loop (x3). 奇怪的是，仅对内部循环（x2）使用索引的函数比对外部循环和内部循环（x3）使用索引的函数的性能要好。

“针对”迭代与直接索引性能

问题描述

3 个解决方案

解决方案1
1 2012-09-10 16:53:24

解决方案2
0 2012-09-10 16:51:46

解决方案3
0 2012-09-10 17:58:05

“针对”迭代与直接索引性能

问题描述

3 个解决方案

解决方案1 1 2012-09-10 16:53:24

解决方案2 0 2012-09-10 16:51:46

解决方案3 0 2012-09-10 17:58:05

解决方案1
1 2012-09-10 16:53:24

解决方案2
0 2012-09-10 16:51:46

解决方案3
0 2012-09-10 17:58:05