简体   繁体   English

如何限制imap_unrdered中工人的吞吐量?

[英]How to limit the workers' throughput in imap_unrdered?

I'm using imap_unordered from the multiprocessing library to parallelize some data processing computations. 我正在使用来自多处理库的imap_unordered来并行化一些数据处理计算。 The problem is that sometimes the master process that reads the returned iterator processes computed results slower than the workers produce them (network/disk speed limits etc.). 问题在于,有时读取返回的迭代器进程的主进程会比工作人员生成的结果慢(网络/磁盘速度限制等)。 This leads the program consuming all available memory and collapsing. 这导致程序消耗所有可用内存并崩溃。

I'd expect the internal iterator to have some internal size limit, so that when the returned iterator is processed too slowly, the internal queue gets full and blocks the producers (asynchronous workers). 我希望内部迭代器有一些内部大小限制,以便当返回的迭代器处理得太慢时,内部队列将满并阻塞生产者(异步工作程序)。 But obviously this is not the case. 但是显然不是这样。

What would be the easiest way how to achieve such behavior? 实现这种行为的最简单方法是什么?

You might want to consider using a Queue : 您可能要考虑使用Queue

import multiprocessing  # Don't use queue.Queue!

MAX_QUEUE_SIZE = 20

q = multiprocessing.Queue(MAX_QUEUE_SIZE)  # Inserts will block if the queue is full

And then, in your master process: 然后,在您的主过程中:

while 1:
    do_something_with(q.get())

And in your children processes: 在您的子进程中:

while 1:
    q.put(create_something())

You'll have to rewrite a bit of the machinery (ie you don't be able to use imap_unordered anymore), but that should be reasonably trivial using Pool 's lower level methods. 您将不得不重写一些机制(即,您不再能够使用imap_unordered ),但是使用Pool的较低级别的方法应该是相当琐碎的。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM