简体繁体中英

How to determine the maximal number of dask.distributed.cluster workers

原文 2021-09-25 13:24:36 0 1 python/ dask-distributed

TL;DR: how to access the maximal number of workers accessible to dask.distributed.client , which would include the workers that aren't started yet, and would work with the adaptive as well as non-adaptive scaling strategy?

I develop a library for adaptive parallel execution of functions, which plans ahead what points to execute. For that we need to know how many workers can be accessed in parallel, and we use client.ncores() function.

This approach is, however, problematic, for several reasons:

The workers need to be running in the first place because ncores only tells about what is happening now.
If the cluster has adaptive scaling, we're interested in the maximal number of workers that we can get rather than in the current one.

Therefore I would like to know if there's a programmatic way to inspect the client and determine how many workers a dask cluster can acquire.

1 answers

A general solution to what you're looking for doesn't and can't (to my knowledge) exist. There's no hard limit on the number of dask workers for many system types. Anything you find here will likely have to be tailored to individual fall cluster variants, if it works at all.

This would have to be a property of the Cluster, not the client. LocalClusters can spin up as many workers/processes/threads as desired - at some point this will not be efficient but the number isn't bounded. See the LocalCluster and its parent class SpecCluster for implementation details.

Other flavors such as dask_jobqueue have a totally different model, whereby nodes are allocated by the HPC workload manager and are theoretically unbounded but are in practice limited by other workloads on the cluster and by your account & the HPC's configuration; similarly kube_cluster will scale up unless limited by the helm chart, quotas, available resources, or your credit card bouncing. If using dask Gateway , the administrator can specify maximum core , memory , and worker counts for the entire cluster, and these limits can be accessed through the cluster config. However, if these are unbounded, there's no hard limit to these values that is known to dask.

If you need to implement this for a specific cluster variant you could narrow your question and maybe get more specific help; but I think you may be limited here to using the APIs of each cluster to get the info you want (eg google/aws/azure etc's kube.netes API, maybe pyslurm, etc), and there's no guarantee that there's an actual upper bound for many of these options.

How to pass a pandas dataframe to dask distributed workers?

How to specify number of workers in Dask.array

Dask.distributed cluster administration

How to execute arbitrary python code on spark cluster distributed to workers

How do I run a dask.distributed cluster in a single thread?

Initializing state on dask-distributed workers

Dask Distributed: How to delete uploaded files from the cluster

How do I share a large read-only object across Dask distributed workers

How to change the dask scheduler and workers?

Dask distributed workers stall when reaching 80% of memory limit

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question How to pass a pandas dataframe to dask distributed workers? How to specify number of workers in Dask.array Dask.distributed cluster administration How to execute arbitrary python code on spark cluster distributed to workers How do I run a dask.distributed cluster in a single thread? Initializing state on dask-distributed workers Dask Distributed: How to delete uploaded files from the cluster How do I share a large read-only object across Dask distributed workers How to change the dask scheduler and workers? Dask distributed workers stall when reaching 80% of memory limit

Related Tags

How to determine the maximal number of dask.distributed.cluster workers

Question

1 answers

solution1 1 ACCPTED 2021-09-26 18:59:17

solution1
1 ACCPTED 2021-09-26 18:59:17