Dask-ML 的 Client() function 中参数的默认值是多少

Question

I am trying to understand Dask-ML's Client() function parameters.我试图了解 Dask-ML 的Client() function 参数。 Say I have the following code using Dask-ML's Client() function:假设我使用 Dask-ML 的Client() function 有以下代码：

from dask.distributed import Client
import joblib

client = Client()

If I don't specify any values for the parameters in the Client() function, what are the default values for the parameters:如果我没有为Client() function中的参数指定任何值，参数的默认值是什么：

(i) n_workers (i) n_workers

(ii) threads_per_worker (ii) threads_per_worker

(iii) memory_limit (iii) memory_limit

From my understanding, Python has the Global Interpreter Lock (GIL) feature which prevents multi-threading.据我了解，Python 具有防止多线程的全局解释器锁 (GIL) 功能。 If so, why does Dask-ML's Client() function have the parameter threads_per_worker when multi-threading is prevented in Python?如果是这样，为什么在 Python 中阻止多线程时，Dask-ML 的Client() function 有参数threads_per_worker ？

Does memory_limit refers to the maximum memory limit allowed for each worker/machine/node or does this refer to the maximum memory limit allowed for all combined worker/machine/node? memory_limit是指每个工作人员/机器/节点允许的最大 memory 限制，还是指所有组合工作人员/机器/节点允许的最大 memory 限制？

I have already looked through the documentation in Dask-ML (see here: https://docs.dask.org/en/latest/setup/single-distributed.html ), but the documentation is not clear in regards to these questions above.我已经查看了 Dask-ML 中的文档（请参阅此处： https://docs.dask.org/en/latest/setup/single-distributed.html ），但关于上述这些问题的文档尚不清楚.

Thank you in advance if anyone could explain this?提前谢谢你，如果有人能解释一下吗？

Answer 1

Calling Client() without any arguments starts a LocalCluster() by default, so在没有任何 arguments 的情况下调用Client()默认会启动LocalCluster() ，所以

client = Client()

Is really the same as真的是一样的

cluster = LocalCluster()
client = Client(cluster)

So, to start, you might take a look at the LocalCluster documentation.因此，首先，您可以查看 LocalCluster 文档。

what are the default values for the parameters:参数的默认值是什么：

The ideal values depend both on your hardware, and on your workload.理想值取决于您的硬件和工作负载。 We don't know your workload up-front, but we do know your hardware, and so we try to make sensible decisions based on that.我们不预先了解您的工作量，但我们确实了解您的硬件，因此我们尝试在此基础上做出明智的决定。

Today that policy is to split all of your logical cores and memory evenly among the square root of the number of cores that you have.今天，该策略是将所有逻辑核心和 memory 平均分配在您拥有的核心数量的平方根中。 So if you have 12 cores then we would create four processes with three threads each.因此，如果您有 12 个内核，那么我们将创建四个进程，每个进程具有三个线程。

This tends to be an ok default in most situations, but we encourage you to play around to see if your workloads perform better under different circumstances.在大多数情况下，这往往是一个不错的默认设置，但我们鼓励您尝试一下，看看您的工作负载在不同情况下是否表现得更好。

Dask-ML 的 Client() function 中参数的默认值是多少

问题描述

1 个解决方案

解决方案1
1 已采纳 2020-06-13 15:10:03

Dask-ML 的 Client() function 中参数的默认值是多少

问题描述

1 个解决方案

解决方案1 1 已采纳 2020-06-13 15:10:03

解决方案1
1 已采纳 2020-06-13 15:10:03