简体   繁体   English

dask kubernetes aks(天蓝色)虚拟节点

[英]dask kubernetes aks (azure) virtual nodes

Using the code bellow it is possible to create a dask kubernetes cluster in azure aks.使用下面的代码可以在 azure aks 中创建一个 dask kubernetes 集群。

It uses a remote scheduler ( dask.config.set({"kubernetes.scheduler-service-type": "LoadBalancer"}) ) and works perfectly.它使用远程调度程序( dask.config.set({"kubernetes.scheduler-service-type": "LoadBalancer"}) )并且运行良好。

To use virtual nodes, uncomment the line extra_pod_config=virtual_config (which follows this official example ).要使用虚拟节点,请取消注释extra_pod_config=virtual_config行(遵循这个官方示例)。

It doesn't work, with the following error:它不起作用,出现以下错误:

ACI does not support providing args without specifying the command. Please supply both command and args to the pod spec.

This is tied to passing containers: args: [dask-scheduler]这与传递containers: args: [dask-scheduler]

Which containers: command: should I supply to fix this issue?哪些containers: command:我应该提供哪些来解决这个问题?

Thank you谢谢

import dask
from dask.distributed import Client
from dask_kubernetes import KubeCluster, KubeConfig, make_pod_spec

image = "daskdev/dask"
cluster = "aks-cluster1"
dask.config.set({"kubernetes.scheduler-service-type": "LoadBalancer"})
dask.config.set({"distributed.comm.timeouts.connect": 180})
virtual_config = {
    "nodeSelector": {
        "kubernetes.io/role": "agent",
        "beta.kubernetes.io/os": "linux",
        "type": "virtual-kubelet",
    },
    "tolerations": [
        {"key": "virtual-kubelet.io/provider", "operator": "Exists"},
    ],
}

pod_spec = make_pod_spec(
    image=image,
    # extra_pod_config=virtual_config,
    memory_limit="2G",
    memory_request="2G",
    cpu_limit=1,
    cpu_request=1,
    threads_per_worker=1,  # same as cpu
)

# az aks get-credentials --name aks-cluster1 --resource-group resource_group1
# cp ~/.kube/config ./aksconfig.yaml
auth = KubeConfig(config_file="./aksconfig.yaml", context=cluster,)
cluster = KubeCluster(
    pod_spec, auth=auth, deploy_mode="remote", scheduler_service_wait_timeout=180
)
client = Client(cluster)

the reason comes from this virtual kubelet protection : in the pod configuration, dask uses args to start a scheduler or worker, but no command is supplied.原因来自这个虚拟 kubelet 保护:在 pod 配置中,dask 使用args来启动调度程序或工作程序,但没有提供任何command

So I explicitly added the entrypoint command command_entrypoint_explicit and it works: pods are created sucessfully.所以我明确地添加了入口点命令command_entrypoint_explicit并且它工作:成功创建了 pod。

Second problem : network names resolution.第二个问题:网络名称解析。 workers fail to connect to the scheduler by network name: tcp://{name}.{namespace}:{port}工作人员无法通过网络名称连接到调度程序: tcp://{name}.{namespace}:{port}

Although tcp://{name}.{namespace}.svc.cluster.local:{port} works.虽然tcp://{name}.{namespace}.svc.cluster.local:{port}有效。 I edited this in dask_kubernetes.core.Scheduler.start and it works.我在dask_kubernetes.core.Scheduler.start中编辑了它,它可以工作。

Another option is the virtual_config bellow.另一个选项是下面的virtual_config Code bellow is a complete solution.下面的代码是一个完整的解决方案。

import dask
from dask.distributed import Client
from dask_kubernetes import KubeCluster, KubeConfig, make_pod_spec

dask.config.set({"kubernetes.scheduler-service-type": "LoadBalancer"})
dask.config.set({"distributed.comm.timeouts.connect": 180})
image = "daskdev/dask"
cluster = "aks-cluster-prod3"
virtual_config = {
    "nodeSelector": {
        "kubernetes.io/role": "agent",
        "beta.kubernetes.io/os": "linux",
        "type": "virtual-kubelet",
    },
    "tolerations": [
        {"key": "virtual-kubelet.io/provider", "operator": "Exists"},
        {"key": "azure.com/aci", "effect": "NoSchedule"},
    ],
    "dnsConfig": {
        "options": [{"name": "ndots", "value": "5"}],
        "searches": [
            "default.svc.cluster.local",
            "svc.cluster.local",
            "cluster.local",
        ],
    },
}

# copied from: https://github.com/dask/dask-docker/blob/master/base/Dockerfile#L25
command_entrypoint_explicit = {
    "command": ["tini", "-g", "--", "/usr/bin/prepare.sh"],
}

pod_spec = make_pod_spec(
    image=image,
    extra_pod_config=virtual_config,
    extra_container_config=command_entrypoint_explicit,
    memory_limit="2G",
    memory_request="2G",
    cpu_limit=1,
    cpu_request=1,
    threads_per_worker=1,  # same as cpu
)

# az aks get-credentials --name aks-cluster1 --resource-group resource_group1
# cp ~/.kube/config ./aksconfig.yaml
auth = KubeConfig(config_file="./aksconfig.yaml", context=cluster,)
cluster = KubeCluster(
    pod_spec,
    auth=auth,
    deploy_mode="remote",
    scheduler_service_wait_timeout=180,
    n_workers=1,
)
client = Client(cluster)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM