简体   繁体   中英

Why this Python parallel loop is taking longer time than sequential loop?

I have this code that I tried to make parallel based on a previous question . Here is the code using 2 processes.

import  multiprocessing
import timeit

start_time = timeit.default_timer()

d1 = dict( (i,tuple([i*0.1,i*0.2,i*0.3])) for i in range(500000) )
d2={}

def fun1(gn):
    x,y,z = d1[gn]
    d2.update({gn:((x+y+z)/3)})

#
if __name__ == '__main__':
    gen1 = [x for x in d1.keys()]
    #fun1(gen1)
    p= multiprocessing.Pool(2)
    p.map(fun1,gen1)
    print('Script finished')
    stop_time = timeit.default_timer()
    print(stop_time - start_time)

Output is:

Script finished
1.8478448875989333

If I change the program to sequential,

fun1(gen1)
#p= multiprocessing.Pool(2)
#p.map(fun1,gen1)

output is:

Script finished
0.8345944193950299

So parallel loop is taking more time that sequential loop, more than double. (My computer has 2 cores, running on Windows.) I tried to find similar questions on the topic, this and this but could not figure out the reason. How can I get performance improvement using multiprocessing module in this example?

When you do p.map(fun1,gen1) you send gen1 over to the other process. This includes serializing the list which is 500000 elements big.

Comparing serialization to the small computation, it takes much longer.

You can measure or profile where the time is spent.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM