简体   繁体   中英

How to specify number of workers in Dask.array

Suppose that you want to specify the number of workers in Dask.array, as Dask documentation shows, you can set:

dask.set_options(pool=ThreadPool(num_workers)) 

This works pretty well with some simulations I've run, for example, montecarlo's, but with some linear algebra operations, it seems that Dask overrides user specified configuration, for example:

import dask.array as da
import dask
from multiprocessing.pool import ThreadPool

dask.set_options(pool=ThreadPool(num_workers))
mat1 = da.random.random((size, size) chunks=chunk_size)
mat2 = da.random.random((size, size) chunks=chunk_size)
mat3 = mat1.dot(mat2)
mat3.compute()

If I run that program with a small matrix size, it apparently uses only num_workers workers, but if I increase matrix size, suddenly it creates dozen of workers, as the image shows. 在此处输入图片说明

So, how can I request Dask to solve the problem using only num_workers workers?

When using the threaded scheduler, Dask doesn't spawn any new processes. Instead it runs everything within your main process.

However, this doesn't stop your functions from spawning processes themselves. As Mike Graham points out in the comments you should be careful about mixing parallel solutions like Dask and a parallel BLAS implementation like MKL or OpenBLAS. This can damage performance. It is often best to set one of the two libraries to use a single thread per call.

I am still confused why you're seeing multiple python processes. To the best of my knowledge neither threaded Dask nor MKL create new processes for computation. However given your positive results from limiting the number of MKL threads perhaps MKL has changed since I last checked in with it.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM