简体   繁体   English

在 Kubernetes v1.13.1 中调度 GPU

[英]Scheduling GPU in Kubernetes v1.13.1

I'm trying to scheduling GPU in Kubernetes v1.13.1 and I followed the guide in https://kubernetes.io/docs/tasks/manage-gpus/scheduling-gpus/#deploying-nvidia-gpu-device-plugin我正在尝试在 Kubernetes v1.13.1 中调度 GPU,我遵循了https://kubernetes.io/docs/tasks/manage-gpus/scheduling-gpus/#deploying-nvidia-gpu-device-plugin 中的指南

But the gpu resources doesn't show up when I run kubectl get nodes -o yaml , according to this post , I checked the Nvidia gpu device plugin.但是当我运行kubectl get nodes -o yaml ,gpu 资源没有显示,根据这篇文章,我检查了 Nvidia gpu 设备插件。

I run:我跑:

kubectl create -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/v1.11/nvidia-device-plugin.yml

several times and the result is几次,结果是

Error from server (AlreadyExists): error when creating "https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/v1.11/nvidia-device-plugin.yml": daemonsets.extensions "nvidia-device-plugin-daemonset" already exists

It seems that I have installed the NVIDIA Device Plugin?好像我已经安装了 NVIDIA Device Plugin? But the result of kubectl get pods --all-namespaces is但是kubectl get pods --all-namespaces

NAMESPACE     NAME                               READY   STATUS    RESTARTS   AGE
kube-system   calico-node-qdhvd                  2/2     Running   0          65m
kube-system   coredns-78d4cf999f-fk4wl           1/1     Running   0          68m
kube-system   coredns-78d4cf999f-zgfvl           1/1     Running   0          68m
kube-system   etcd-liuqin01                      1/1     Running   0          67m
kube-system   kube-apiserver-liuqin01            1/1     Running   0          67m
kube-system   kube-controller-manager-liuqin01   1/1     Running   0          67m
kube-system   kube-proxy-l8p9p                   1/1     Running   0          68m
kube-system   kube-scheduler-liuqin01            1/1     Running   0          67m

When I run kubectl describe node , gpu is not in the the allocatable resource当我运行kubectl describe node ,gpu 不在可分配资源中

Non-terminated Pods:         (9 in total)
Namespace                  Name                                    CPU Requests  CPU Limits  Memory Requests  Memory Limits  AGE
---------                  ----                                    ----------- -  ----------  ---------------  -------------  ---
kube-system                calico-node-qdhvd                       250m (2%)     0 (0%)      0 (0%)           0 (0%)         18h
kube-system                coredns-78d4cf999f-fk4wl                100m (0%)     0 (0%)      70Mi (0%)        170Mi (1%)     19h
kube-system                coredns-78d4cf999f-zgfvl                100m (0%)     0 (0%)      70Mi (0%)        170Mi (1%)     19h
kube-system                etcd-liuqin01                           0 (0%)        0 (0%)      0 (0%)           0 (0%)         19h
kube-system                kube-apiserver-liuqin01                 250m (2%)     0 (0%)      0 (0%)           0 (0%)         19h
kube-system                kube-controller-manager-liuqin01        200m (1%)     0 (0%)      0 (0%)           0 (0%)         19h
kube-system                kube-proxy-l8p9p                        0 (0%)        0 (0%)      0 (0%)           0 (0%)         19h
kube-system                kube-scheduler-liuqin01                 100m (0%)     0 (0%)      0 (0%)           0 (0%)         19h
kube-system                nvidia-device-plugin-daemonset-p78wz    0 (0%)        0 (0%)      0 (0%)           0 (0%)         26m
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource           Requests    Limits
--------           --------    ------
cpu                1 (8%)      0 (0%)
memory             140Mi (0%)  340Mi (2%)
ephemeral-storage  0 (0%)      0 (0%)

As lianyouCat mentioned in the comments:正如lianyouCat在评论中提到的:

After installing nvidia-docker2, the default runtime of docker should be modified to nvidia docker as github.com/NVIDIA/k8s-device-plugin#preparing-your-gpu-nodes .安装 nvidia-docker2 后,需要将 docker 的默认运行时间修改为 nvidia docker github.com/NVIDIA/k8s-device-plugin#preparing-your-gpu-nodes

After modifying the /etc/docker/daemon.json , you need to restart docker so that the configuration works.修改/etc/docker/daemon.json ,需要重启/etc/docker/daemon.json

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM