I am trying to use dask to do some embarassingly parallel processing. For some reaason I have to use dask but the task could be easily achieved using multiprocessing.Pool(5).map
.
For example:
import dask
from dask import compute, delayed
def do_something(x): return x * x
data = range(10)
delayed_values = [delayed(do_something)(x) for x in data]
results = compute(*delayed_values, scheduler='processes')
It works, but apparently it uses only one process.
How can I configure dask so it uses a pool of 5 processes for this computation?
您可以使用num_workers
参数来指定compute
方法的进程数。
results = compute(*delayed_values, scheduler='processes', num_workers=5)
you can configure it to use a custom process pool as such:
import dask
from multiprocessing.pool import Pool
dask.config.set(pool=Pool(5))
or as a context manager:
with dask.config.set(scheduler='processes', num_workers=5):
...
you may want to read this dask_scheduling
or my previous answer
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.