Multiprocessing slower than sequential with python

Question

I've invested quite a lot of time in rewriting my code to exploit more cores, but when I benchmarked it I found that all I had achieved was making it 7 times slower than the original code, despite running on 16 cores rather than one! This leads me to believe that I must be doing something wrong.

The code is 4000+ lines and needs a number of pretty heavy input files, so I'm not going to be able to post something that reproduces the problem. However, I can say that the function that I'm calling typically takes 0.1s to run and calls some c libraries using ctypes. It is also passed a fair amount of data in memory - maybe 1 MB? Some pseudo code that looks like the slow bit:

    def AnalyseSection(Args):
        Sectionsi,SectionNo,ElLoads,ElLoadsM,Materials,CycleCount,FlapF,EdgeF,Scaling,Time,FlapFreq,EdgeFreq=Args
        for i in range(len(Sections[Elements])):
           #Do some heavy lifting with ctypes
        return Result

     for i in range(10):
         for j in range(10):
             for k in range(10):
                 Args=[(Sections[i],SectionList[i],ElLoads,ElLoadsM,Materials,CycleCount,FlapF,EdgeF,Scaling,Time,FlapFreq,EdgeFreq) for n in SectionList]
                 pool=mp.Pool(processes=NoCPUs,maxtasksperchild=1)
                 result = pool.map(AnalyseSection,Args)
                 pool.close()
                 pool.join()

I was hoping someone could spot an obvious error that's causing it to run so much more slowly? The function takes a while to run (0.1s for each call typically) so I'd not think that the overhead associated with multiprocessing could slow it down so much. Any help will be much appreciated!

Answer 1

This

 for i in range(10):
     for j in range(10):
         for k in range(10):
             Args=[(Sections[i],SectionList[i],ElLoads,ElLoadsM,Materials,CycleCount,FlapF,EdgeF,Scaling,Time,FlapFreq,EdgeFreq) for n in SectionList]
             pool=mp.Pool(processes=NoCPUs,maxtasksperchild=1)
             result = pool.map(AnalyseSection,Args)
             pool.close()
             pool.join()

can and should be transformed to this

 pool=mp.Pool(processes=NoCPUs)

 for i in range(10):
     for j in range(10):
         for k in range(10):
             Args=[(Sections[i],SectionList[i],ElLoads,ElLoadsM,Materials,CycleCount,FlapF,EdgeF,Scaling,Time,FlapFreq,EdgeFreq) for n in SectionList]
             result = pool.map(AnalyseSection,Args)

 pool.join()

This is more in line with what you are trying to achieve. You have a multiprocessing Pool where you feed the data and wait for the results. You don't have to start/stop the pool in each iteration.

Keep in mind that there is a cost associated with starting processed (much bigger than threads, if you are used to threads).

Multiprocessing slower than sequential with python

Question

1 answers

solution1
1 ACCPTED 2015-03-10 11:38:18

Multiprocessing slower than sequential with python

Question

1 answers

solution1 1 ACCPTED 2015-03-10 11:38:18

solution1
1 ACCPTED 2015-03-10 11:38:18