简体   繁体   中英

dask: specify number of processes

I am trying to use dask to do some embarassingly parallel processing. For some reaason I have to use dask but the task could be easily achieved using multiprocessing.Pool(5).map .

For example:

import dask
from dask import compute, delayed

def do_something(x): return x * x

data = range(10)
delayed_values = [delayed(do_something)(x) for x in data]
results = compute(*delayed_values, scheduler='processes')

It works, but apparently it uses only one process.

How can I configure dask so it uses a pool of 5 processes for this computation?

您可以使用num_workers参数来指定compute方法的进程数。

results = compute(*delayed_values, scheduler='processes', num_workers=5)

you can configure it to use a custom process pool as such:

import dask
from multiprocessing.pool import Pool

dask.config.set(pool=Pool(5))

or as a context manager:

with dask.config.set(scheduler='processes', num_workers=5):
    ...

you may want to read this dask_scheduling

or my previous answer

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM