The different performance of Python multiprocessing.Pool on MacOS and Linux systems

Question

I'm a beginner in Python. I used multiprocessing.Pool in my project to imporve performance.

Here's a snippet of code I use the multiprocessing.Pool.

I build the pool at the starting of my resident server, and use the Pool.apply_async method every time when the server get a request:

# build pool when server started
mp.set_start_method('forkserver')
self._driver_pool = Pool(processes=10)
self._executor_pool = Pool(processes=30)  
# use pool every time get a request
driver = driver_class(driver_context, init_table, self._manager, **kwargs_dict)
future = self._driver_pool.apply_async(driver.run)

I tested the code on my computer which's operating system is MacOS, and then I deploy the code on a Linux computer.

I found that when I run my code on MacOS, the Pool.apply_async method costs likely 10ms, but the same code on Linux will cost 2s.

I don't understand why there is such a big difference in performance, Is there something wrong with the way I use the multiprocessing.Pool?

Answer 1

After some tests, I have a conjecture.

The current phenomenon is when the size of Pool is set to be 30, the first 30 requests were slow, but after that, the performance of tasks will decrease significantly.

On MacOS, I compared performance in both scenarios with and without pyc files, I found that the cost will raise after I deleted the pyc files.

I suspect there are several possible reasons for the performance differences:

When using 'forkserver' method to start a process, it will load all the resources including import files, which means the process will try to find the pyc files, otherwise it will compile the python file to pyc files and then load them.
The processes in a Pool will never release, which means once a process load pyc files into its memory, it will never load again.
The Mac computer has SSD hard disk, which means if a process on Mac try to load pyc files, it will get better performance than the process on a computer which do not have SSD hard disk.

Now the question I'm running into is whether there are ways to pre-load resources for processes started with 'forkserver' method for better performance.

The different performance of Python multiprocessing.Pool on MacOS and Linux systems

Question

1 answers

solution1
0 2021-01-26 14:16:54

The different performance of Python multiprocessing.Pool on MacOS and Linux systems

Question

1 answers

solution1 0 2021-01-26 14:16:54

solution1
0 2021-01-26 14:16:54