简体   繁体   中英

Why reinitialize multiprocess workers?

I found this note in Python multiprocessing Pool library docs:

Worker processes within a Pool typically live for the complete duration of the Pool's work queue. A frequent pattern found in other systems (such as Apache, mod_wsgi, etc) to free resources held by workers is to allow a worker within a pool to complete only a set amount of work before being exiting, being cleaned up and a new process spawned to replace the old one. The maxtasksperchild argument to the Pool exposes this ability to the end user.

It says that Apache and others reinitialize multiprocess entities which makes it a good pattern. Is it about reinitializing objects (like "OOP" objects) to call garbage collector? Why is it important? GC can not be used while multiprocessing object exists?

One of the main use case is to avoid small leaks (memory, fd, ...) to affect long running services.

As software is not perfect, it's often the case that some library is not cleaning up properly. As these issues are out of the developer's control, an easy fix is to periodically terminate the leaking processes and start anew.

Another use case is to control the memory consumption of your service. Python memory fragmentation for example often leads workers to eat much more memory than needed.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM