简体   繁体   English

如何在Dask.array中指定工人数

[英]How to specify number of workers in Dask.array

Suppose that you want to specify the number of workers in Dask.array, as Dask documentation shows, you can set: 假设您要指定Dask.array中的worker数量,如Dask文档所示,您可以设置:

dask.set_options(pool=ThreadPool(num_workers)) 

This works pretty well with some simulations I've run, for example, montecarlo's, but with some linear algebra operations, it seems that Dask overrides user specified configuration, for example: 这在我运行的某些模拟(例如montecarlo)中非常有效,但是对于某些线性代数运算,似乎Dask会覆盖用户指定的配置,例如:

import dask.array as da
import dask
from multiprocessing.pool import ThreadPool

dask.set_options(pool=ThreadPool(num_workers))
mat1 = da.random.random((size, size) chunks=chunk_size)
mat2 = da.random.random((size, size) chunks=chunk_size)
mat3 = mat1.dot(mat2)
mat3.compute()

If I run that program with a small matrix size, it apparently uses only num_workers workers, but if I increase matrix size, suddenly it creates dozen of workers, as the image shows. 如果我以较小的矩阵大小运行该程序,则显然只使用num_workers worker,但是如果我增加矩阵大小,则突然会创建数十个worker,如图所示。 在此处输入图片说明

So, how can I request Dask to solve the problem using only num_workers workers? 因此,我怎样才能要求Dask仅使用num_workers工人来解决问题?

When using the threaded scheduler, Dask doesn't spawn any new processes. 使用线程调度程序时,Dask不会产生任何新进程。 Instead it runs everything within your main process. 相反,它将在您的主要流程中运行所有内容。

However, this doesn't stop your functions from spawning processes themselves. 但是,这并不能阻止您的函数自己生成进程。 As Mike Graham points out in the comments you should be careful about mixing parallel solutions like Dask and a parallel BLAS implementation like MKL or OpenBLAS. 正如Mike Graham在评论中指出的那样,您应谨慎混合使用诸如Dask之类的并行解决方案和诸如MKL或OpenBLAS之类的并行BLAS实现。 This can damage performance. 这会损害性能。 It is often best to set one of the two libraries to use a single thread per call. 通常最好将两个库之一设置为每个调用使用一个线程。

I am still confused why you're seeing multiple python processes. 我仍然很困惑,为什么您会看到多个python进程。 To the best of my knowledge neither threaded Dask nor MKL create new processes for computation. 就我所知,线程式Dask和MKL都不会创建新的计算过程。 However given your positive results from limiting the number of MKL threads perhaps MKL has changed since I last checked in with it. 但是,鉴于您从限制MKL线程数方面取得了积极的成果,自从我上次进行检查以来,MKL可能已经发生了变化。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM