Why this Python parallel loop is taking longer time than sequential loop?

Question

I have this code that I tried to make parallel based on a previous question . Here is the code using 2 processes.

import  multiprocessing
import timeit

start_time = timeit.default_timer()

d1 = dict( (i,tuple([i*0.1,i*0.2,i*0.3])) for i in range(500000) )
d2={}

def fun1(gn):
    x,y,z = d1[gn]
    d2.update({gn:((x+y+z)/3)})

#
if __name__ == '__main__':
    gen1 = [x for x in d1.keys()]
    #fun1(gen1)
    p= multiprocessing.Pool(2)
    p.map(fun1,gen1)
    print('Script finished')
    stop_time = timeit.default_timer()
    print(stop_time - start_time)

Output is:

Script finished
1.8478448875989333

If I change the program to sequential,

fun1(gen1)
#p= multiprocessing.Pool(2)
#p.map(fun1,gen1)

output is:

Script finished
0.8345944193950299

So parallel loop is taking more time that sequential loop, more than double. (My computer has 2 cores, running on Windows.) I tried to find similar questions on the topic, this and this but could not figure out the reason. How can I get performance improvement using multiprocessing module in this example?

Answer 1

When you do p.map(fun1,gen1) you send gen1 over to the other process. This includes serializing the list which is 500000 elements big.

Comparing serialization to the small computation, it takes much longer.

You can measure or profile where the time is spent.

Why this Python parallel loop is taking longer time than sequential loop?

Question

1 answers

solution1
2 2017-08-12 16:20:13

Why this Python parallel loop is taking longer time than sequential loop?

Question

1 answers

solution1 2 2017-08-12 16:20:13

solution1
2 2017-08-12 16:20:13