简体   繁体   English

Python利用多个处理器

[英]Python utilizing multiple processors

Lets say I have a big list of music of varying length that needs to be converted or images of varying sizes that need to be resized or something like that. 让我们说我有一个很大的不同长度的音乐列表需要转换或不同大小的图像需要调整大小或类似的东西。 The order doesn't matter so it is perfect for splitting across multiple processors. 顺序无关紧要,因此非常适合分割多个处理器。

If I use multiprocessing.Pool's map function it seems like all the work is divided up ahead of time and doesn't take into account the fact that some files may take longer to do that others. 如果我使用multiprocessing.Pool的map函数,似乎所有的工作都是提前划分的,并没有考虑到某些文件可能需要更长时间来完成其他工作的事实。

What happens is that if I have 12 processors... near the end of processing, 1 or 2 processors will have 2 or 3 files left to process while other processors that could be utilized sit idle. 如果我有12个处理器......接近处理结束时,1或2个处理器将剩下2或3个文件处理,而其他可以使用的处理器闲置。

Is there some sort of queue implementation that can keep all processors loaded until there is no more work left to do? 是否有某种队列实现可以保持所有处理器加载,直到没有剩下的工作要做?

There is a Queue class within the multiprocessing module specifically for this purpose. multiprocessing模块中有一个Queue类专门用于此目的。

Edit: If you are looking for a complete framework for parallel computing which features a map() function using a task queue, have a look at the parallel computing facilities of IPython . 编辑:如果您正在寻找一个完整的并行计算框架,它具有使用任务队列的map()函数,请查看IPython的并行计算工具。 In particlar, you can use the TaskClient.map() function to get a load-balanced mapping to the available processors. 特别是,您可以使用TaskClient.map()函数来获得可用处理器的负载平衡映射。

This is trivial to do with jug : 这对于水壶来说是微不足道的:

def process_image(img):
     ....
images = glob('*.jpg')
for im in images:
      Task(process_image, im)

Now, just run jug execute a few times to spawn worker processes. 现在,只需运行jug execute几次就可以生成工作进程。

About queue implementations. 关于队列实现。 There are some. 有一些。

Look at the Celery project. 看看Celery项目。 http://celeryproject.org/ http://celeryproject.org/

So, in your case, you can run 12 conversions (one on each CPU) as Celery tasks, add a callback function (to the conversion or to the task) and in that callback function add a new conversion task running when one of the previous conversions is finished. 因此,在您的情况下,您可以作为Celery任务运行12次转换(每个CPU一次),添加回调函数(转换或任务),并在该回调函数中添加一个新的转换任务,当前一个运行时转换完成。

The Python threading library that has brought me most joy is Parallel Python (PP) . 最让我高兴的Python线程库是Parallel Python(PP) It is trivial with PP to use a thread pool approach with a single queue to achieve what you need. PP使用线程池方法和单个队列来实现您的需求是微不足道的。

如果您使用Pool.imap_unordered则不是这种情况。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM