不断运行的工人池

Question

I'm using multiprocessor.Pool to parallelize processing some files.我正在使用multiprocessor.Pool来并行处理一些文件。 The code waits for a file to be received, then sends that file to a worker using Pool.apply_async , which then processes the file.代码等待接收文件，然后使用Pool.apply_async将该文件发送给工作人员，然后工作人员处理该文件。

This code is supposed to be always running, therefore I don't ever close the pool.这段代码应该一直在运行，因此我从不关闭池。 This however causes the pool to consume a lot of memory over time.然而，这会导致池随着时间的推移消耗大量内存。

The code is something like this:代码是这样的：

if __name__ == "__main__":
    with Pool(processes=PROCESS_COUNT) as pool:
        while True:
            f = wait_for_file()
            pool.apply_async(process_file, (f,))

How can I prevent high memory usage from happening without closing the pool?如何在不关闭池的情况下防止发生高内存使用率？

Answer 1

Yes, if you allocate resources and you don't deallocate them be it number of spawned processes or simply (a chunk of) memory, you'll have less resources for other tasks on your machine until you or your system willingly or forcefully deallocate them.是的，如果您分配资源并且不释放它们，无论是产生的进程的数量还是简单的（一大块）内存，那么在您或您的系统自愿或强制释放它们之前，您机器上其他任务的资源将减少.

You may want to use maxtasksperchild argument for Pool to attempt killing the slaves eg if they allocate memory and you have a leak somewhere, so you save at least some resources.您可能需要使用maxtasksperchild论据Pool试图杀死奴隶，例如，如果他们所分配的内存，你有泄漏的地方，让你享受至少一些资源。

Note: Worker processes within a Pool typically live for the complete duration of the Pool's work queue.注意：池中的工作进程通常在池的工作队列的整个持续时间内都存在。 A frequent pattern found in other systems (such as Apache, mod_wsgi, etc) to free resources held by workers is to allow a worker within a pool to complete only a set amount of work before being exiting, being cleaned up and a new process spawned to replace the old one.在其他系统（例如 Apache、mod_wsgi 等）中发现的一种常见模式，用于释放工作人员持有的资源，是允许池中的工作人员在退出、清理和产生新进程之前仅完成一定数量的工作替换旧的。 The maxtasksperchild argument to the Pool exposes this ability to the end user. Pool 的 maxtasksperchild 参数向最终用户公开了这种能力。

Alternatively, don't roll your own implementation of Pool because until you get there it'll be buggy and you'll unnecessarily burn the time.或者，不要推出你自己的Pool实现，因为在你到达那里之前它会出现问题并且你会不必要地浪费时间。 Instead use eg Celery ( tutorial ) which hopefully even has tests for nasty corner-cases you might spend more time on than necessary.而是使用例如Celery （教程），它甚至可能对讨厌的极端情况进行测试，您可能会花费比必要的更多时间。

Or, if you want to experiment a bit, here is a similar question which provides steps to custom slave pool management.或者，如果您想尝试一下，这里有一个类似的问题，它提供了自定义从属池管理的步骤。

不断运行的工人池

问题描述

1 个解决方案

解决方案1
2 已采纳 2021-07-16 15:53:38

不断运行的工人池

问题描述

1 个解决方案

解决方案1 2 已采纳 2021-07-16 15:53:38

解决方案1
2 已采纳 2021-07-16 15:53:38