Is python automagically parallelizing IO- and CPU- or memory-bound sections?

Question

This is a follow-up questions on a previous one .

Consider this code, which is less toyish than the one in the previous question (but still much simpler than my real one)

import sys
data=[]

for line in open(sys.argv[1]):
    data.append(line[-1])

print data[-1]

Now, I was expecting a longer run time (my benchmark file is 65150224 lines long), possibly much longer. This was not the case, it runs in ~ 2 minutes on the same hw as before!

Is it data.append() very lightweight? I don't believe so, thus I wrote this fake code to test it:

data=[]
counter=0
string="a\n"

for counter in xrange(65150224):
    data.append(string[-1])

print data[-1]

This runs in 1.5 to 3 minutes (there is strong variability among runs)

Why don't I get 3.5 to 5 minutes in the former program? Obviously data.append() is happening in parallel with the IO.

This is good news!

But how does it work? Is it a documented feature? Is there any requirement on my code that I should follow to make it works as much as possible (besides load-balancing IO and memory/CPU activities)? Or is it just plain buffering/caching in action?

Again, I tagged "linux" this question, because I'm interested only in linux-specific answers. Feel free to give OS-agnostic, or even other-OS answers, if you think it's worth doing.

Answer 1

Obviously data.append() is happening in parallel with the IO.

I'm afraid not. It is possible to parallelize IO and computation in Python, but it doesn't happen magically.

One thing you could do is use posix_fadvise(2) to give the OS a hint that you plan to read the file sequentially ( POSIX_FADV_SEQUENTIAL ).

In some rough tests doing "wc -l" on a 600 meg file (an ISO) the performance increased by about 20%. Each test was done immediately after clearing the disk cache.

For a Python interface to fadvise see python-fadvise .

Answer 2

How big are the lines in your file? If they're not very long (anything under about 1K probably qualifies) then you're likely seeing performance gains because of input buffering.

Answer 3

Why do you think list.append() would be a slower operation? It is extremely fast, considering the internal pointer arrays used by lists to hold references to the objects in them are allocated in increasingly large blocks, so that every append does not actually re-allocate the array, and most can simply increment the length counter and set a pointer and incref.

Answer 4

I don't see any evidence that "data.append() is happening in parallel with the IO." Like Benji, I don't think this is automatic in the way you think. You showed that doing data.append(line[-1]) takes about the same amount of time as lc = lc + 1 (essentially no time at all, compared to the IO and line splitting). It's not really surprising that data.append(line[-1]) is very fast. One would expect the whole line to be in a fast cache, and as noted append prepares buffers ahead of time and only rarely has to reallocate. Moreover, line[-1] will always be '\\n', except possibly for the last line of the file (no idea if Python optimizes for this).

The only part I'm a little surprised about is that the xrange is so variable. I would expect it to always be faster, since there's no IO, and you're not actually using the counter.

Answer 5

如果您的运行时间因第二个示例的运行时间而异，我怀疑您的计时方法或外部影响（其他进程/系统负载）会将时间偏移到不提供任何可靠信息的程度。

Is python automagically parallelizing IO- and CPU- or memory-bound sections?

Question

5 answers

solution1
8 ACCPTED 2009-05-13 23:58:13

solution2
1 2009-05-13 23:37:27

solution3
1

solution4
1 2009-05-14 00:02:32

solution5
1 2009-05-14 04:11:56

Is python automagically parallelizing IO- and CPU- or memory-bound sections?

Question

5 answers

solution1 8 ACCPTED 2009-05-13 23:58:13

solution2 1 2009-05-13 23:37:27

solution3 1

solution4 1 2009-05-14 00:02:32

solution5 1 2009-05-14 04:11:56

solution1
8 ACCPTED 2009-05-13 23:58:13

solution2
1 2009-05-13 23:37:27

solution3
1

solution4
1 2009-05-14 00:02:32

solution5
1 2009-05-14 04:11:56