I have this code that I tried to make parallel based on a previous question . Here is the code using 2 processes.
import multiprocessing
import timeit
start_time = timeit.default_timer()
d1 = dict( (i,tuple([i*0.1,i*0.2,i*0.3])) for i in range(500000) )
d2={}
def fun1(gn):
x,y,z = d1[gn]
d2.update({gn:((x+y+z)/3)})
#
if __name__ == '__main__':
gen1 = [x for x in d1.keys()]
#fun1(gen1)
p= multiprocessing.Pool(2)
p.map(fun1,gen1)
print('Script finished')
stop_time = timeit.default_timer()
print(stop_time - start_time)
Output is:
Script finished
1.8478448875989333
If I change the program to sequential,
fun1(gen1)
#p= multiprocessing.Pool(2)
#p.map(fun1,gen1)
output is:
Script finished
0.8345944193950299
So parallel loop is taking more time that sequential loop, more than double. (My computer has 2 cores, running on Windows.) I tried to find similar questions on the topic, this and this but could not figure out the reason. How can I get performance improvement using multiprocessing module in this example?
When you do p.map(fun1,gen1)
you send gen1
over to the other process. This includes serializing the list which is 500000 elements big.
Comparing serialization to the small computation, it takes much longer.
You can measure or profile
where the time is spent.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.