“for” iteration vs. direct indexing performance

Question

Just got a strange results that I am trying to understand. I have a dataset about 325k rows (lists) with about 90 items each (strings, floats etc - it doesn't really matter). Say, if I want to do some processing for all item then I can iterate over them using 2 "for"s:

for eachRow in rows:
    for eachItem in eachRow:
        # do something

In my system this code executed for 41 sec. But if I replace nested loop with series of index acess ( eachRow[0], eachRowm[1] and so far up to eachRow[89] ), the execution time drops to 25 sec.

for eachRow in rows:
    eachRow[0]  # do something with this item
    eachRow[1]  # do something with this item
    ..
    eachRow[89] # do something with this item

Of course, writing code like that is not a good idea - I was just looking for a way to impove data processing performance and accidentally found this strange approach. Any comments?

Answer 1

There does seem to be a slight performance advantage to doing the unrolling, but it's negligible, and so unless your do_something function really does almost nothing, you shouldn't see the difference. I have a tough time believing equivalent behaviour with the different approach could amount to a factor of 60%, although I'm always willing to be surprised by some implementation detail I'd never thought about.

tl;dr summary, using 32500 instead of 325000 because I'm impatient:

do_nothing easy 3.44702410698
do_nothing indexed 3.99766016006
do_nothing mapped 4.36127090454
do_nothing unrolled 3.33416581154
do_something easy 5.4152610302
do_something indexed 5.95649385452
do_something mapped 6.20316290855
do_something unrolled 5.2877831459
do_more easy 16.6573209763
do_more indexed 16.8381450176
do_more mapped 17.6184959412
do_more unrolled 16.0713188648

CPython 2.7.3, code:

from timeit import Timer

nrows = 32500
ncols = 90
a = [[1.0*i for i in range(ncols)] for j in range(nrows)]

def do_nothing(x):
    pass

def do_something(x):
    z = x+3
    return z

def do_more(x):
    z = x**3+x**0.5+4
    return z

def easy(rows, action):
    for eachRow in rows:
        for eachItem in eachRow:
            action(eachItem)

def mapped(rows, action):
    for eachRow in rows:
        map(action, eachRow)

def indexed(rows, action):
    for eachRow in rows:
        for i in xrange(len(eachRow)):
            action(eachRow[i])

def unrolled(rows, action):
    for eachRow in rows:
        action(eachRow[0])
        action(eachRow[1])
        action(eachRow[2])
        action(eachRow[3])
        action(eachRow[4])
        action(eachRow[5])
        action(eachRow[6])
        action(eachRow[7])
        action(eachRow[8])
        action(eachRow[9])
        action(eachRow[10])
        action(eachRow[11])
        action(eachRow[12])
        action(eachRow[13])
        action(eachRow[14])
        action(eachRow[15])
        action(eachRow[16])
        action(eachRow[17])
        action(eachRow[18])
        action(eachRow[19])
        action(eachRow[20])
        action(eachRow[21])
        action(eachRow[22])
        action(eachRow[23])
        action(eachRow[24])
        action(eachRow[25])
        action(eachRow[26])
        action(eachRow[27])
        action(eachRow[28])
        action(eachRow[29])
        action(eachRow[30])
        action(eachRow[31])
        action(eachRow[32])
        action(eachRow[33])
        action(eachRow[34])
        action(eachRow[35])
        action(eachRow[36])
        action(eachRow[37])
        action(eachRow[38])
        action(eachRow[39])
        action(eachRow[40])
        action(eachRow[41])
        action(eachRow[42])
        action(eachRow[43])
        action(eachRow[44])
        action(eachRow[45])
        action(eachRow[46])
        action(eachRow[47])
        action(eachRow[48])
        action(eachRow[49])
        action(eachRow[50])
        action(eachRow[51])
        action(eachRow[52])
        action(eachRow[53])
        action(eachRow[54])
        action(eachRow[55])
        action(eachRow[56])
        action(eachRow[57])
        action(eachRow[58])
        action(eachRow[59])
        action(eachRow[60])
        action(eachRow[61])
        action(eachRow[62])
        action(eachRow[63])
        action(eachRow[64])
        action(eachRow[65])
        action(eachRow[66])
        action(eachRow[67])
        action(eachRow[68])
        action(eachRow[69])
        action(eachRow[70])
        action(eachRow[71])
        action(eachRow[72])
        action(eachRow[73])
        action(eachRow[74])
        action(eachRow[75])
        action(eachRow[76])
        action(eachRow[77])
        action(eachRow[78])
        action(eachRow[79])
        action(eachRow[80])
        action(eachRow[81])
        action(eachRow[82])
        action(eachRow[83])
        action(eachRow[84])
        action(eachRow[85])
        action(eachRow[86])
        action(eachRow[87])
        action(eachRow[88])
        action(eachRow[89])


def timestuff():
    for action in 'do_nothing do_something do_more'.split():
        for name in 'easy indexed mapped unrolled'.split():
            t = Timer(setup="""
from __main__ import {} as fn
from __main__ import {} as action
from __main__ import a
""".format(name, action),
                      stmt="fn(a, action)").timeit(10)
            print action, name, t

if __name__ == '__main__':
    timestuff()

(Note that I didn't bother making the comparisons exactly fair, because I was only trying to gauge the likely scale of the variations, ie changes of order unity or not.)

Answer 2

Sorry guys, it was my fault. It was something wrong with my system (this is not a standalone Python interpreter but a built-in in big system). After restarting the whole system I've got right results - about 2.8 sec for both variants. I feel stupid. Looking for a way to delete my question because of irrelevance.

Answer 3

Unlike the other responder who timed this, I saw a quite a large difference in timings. First, my code:

import random
import string
import timeit

r = 1000
outer1 = [[[''.join([random.choice(string.ascii_letters) for j in range(10)])] for k in range(90)] for l in range(r)]
outer2 = [[[''.join([random.choice(string.ascii_letters) for j in range(10)])] for k in range(90)] for l in range(r)]
outer3 = [[[''.join([random.choice(string.ascii_letters) for j in range(10)])] for k in range(90)] for l in range(r)]

def x1(L):
    for outer in L:
        for inner in L:
            inner = inner[:-1]

def x2(L):
    for outer in L:
        for y in range(len(outer)):
            outer[y] = outer[y][:-1]

def x3(L):
    for x in range(len(L)):
        for y in range(len(L[x])):
            L[x][y] = L[x][y][:-1]

print "x1 =",timeit.Timer('x1(outer1)', "from __main__ import x1,outer1").timeit(10)
print "x2 =",timeit.Timer('x2(outer2)', "from __main__ import x2,outer2").timeit(10)
print "x3 =",timeit.Timer('x3(outer3)', "from __main__ import x3,outer3").timeit(10)

Note I'm running each of these 10 times. Each list is being populated with 3000 items which each contain 90 items which are each a random string of ten letters.

Representative results:

x1 = 8.0179214353
x2 = 0.118051644801
x3 = 0.150409681521

The function that uses no indexing (x1) takes 66 times longer to execute than does the one which uses indexing only for the inner loop (x2). Oddly enough, the function which only uses the indexing for the inner loop (x2) performs better than the one which uses indexing for both the outer loop and the inner loop (x3).

“for” iteration vs. direct indexing performance

Question

3 answers

solution1
1 2012-09-10 16:53:24

solution2
0 2012-09-10 16:51:46

solution3
0 2012-09-10 17:58:05

“for” iteration vs. direct indexing performance

Question

3 answers

solution1 1 2012-09-10 16:53:24

solution2 0 2012-09-10 16:51:46

solution3 0 2012-09-10 17:58:05

solution1
1 2012-09-10 16:53:24

solution2
0 2012-09-10 16:51:46

solution3
0 2012-09-10 17:58:05