[英]Scheduling GPU in Kubernetes v1.13.1
I'm trying to scheduling GPU in Kubernetes v1.13.1 and I followed the guide in https://kubernetes.io/docs/tasks/manage-gpus/scheduling-gpus/#deploying-nvidia-gpu-device-plugin我正在尝试在 Kubernetes v1.13.1 中调度 GPU,我遵循了https://kubernetes.io/docs/tasks/manage-gpus/scheduling-gpus/#deploying-nvidia-gpu-device-plugin 中的指南
But the gpu resources doesn't show up when I run kubectl get nodes -o yaml
, according to this post , I checked the Nvidia gpu device plugin.但是当我运行
kubectl get nodes -o yaml
,gpu 资源没有显示,根据这篇文章,我检查了 Nvidia gpu 设备插件。
I run:我跑:
kubectl create -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/v1.11/nvidia-device-plugin.yml
several times and the result is几次,结果是
Error from server (AlreadyExists): error when creating "https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/v1.11/nvidia-device-plugin.yml": daemonsets.extensions "nvidia-device-plugin-daemonset" already exists
It seems that I have installed the NVIDIA Device Plugin?好像我已经安装了 NVIDIA Device Plugin? But the result of
kubectl get pods --all-namespaces
is但是
kubectl get pods --all-namespaces
是
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system calico-node-qdhvd 2/2 Running 0 65m
kube-system coredns-78d4cf999f-fk4wl 1/1 Running 0 68m
kube-system coredns-78d4cf999f-zgfvl 1/1 Running 0 68m
kube-system etcd-liuqin01 1/1 Running 0 67m
kube-system kube-apiserver-liuqin01 1/1 Running 0 67m
kube-system kube-controller-manager-liuqin01 1/1 Running 0 67m
kube-system kube-proxy-l8p9p 1/1 Running 0 68m
kube-system kube-scheduler-liuqin01 1/1 Running 0 67m
When I run kubectl describe node
, gpu is not in the the allocatable resource当我运行
kubectl describe node
,gpu 不在可分配资源中
Non-terminated Pods: (9 in total)
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits AGE
--------- ---- ----------- - ---------- --------------- ------------- ---
kube-system calico-node-qdhvd 250m (2%) 0 (0%) 0 (0%) 0 (0%) 18h
kube-system coredns-78d4cf999f-fk4wl 100m (0%) 0 (0%) 70Mi (0%) 170Mi (1%) 19h
kube-system coredns-78d4cf999f-zgfvl 100m (0%) 0 (0%) 70Mi (0%) 170Mi (1%) 19h
kube-system etcd-liuqin01 0 (0%) 0 (0%) 0 (0%) 0 (0%) 19h
kube-system kube-apiserver-liuqin01 250m (2%) 0 (0%) 0 (0%) 0 (0%) 19h
kube-system kube-controller-manager-liuqin01 200m (1%) 0 (0%) 0 (0%) 0 (0%) 19h
kube-system kube-proxy-l8p9p 0 (0%) 0 (0%) 0 (0%) 0 (0%) 19h
kube-system kube-scheduler-liuqin01 100m (0%) 0 (0%) 0 (0%) 0 (0%) 19h
kube-system nvidia-device-plugin-daemonset-p78wz 0 (0%) 0 (0%) 0 (0%) 0 (0%) 26m
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource Requests Limits
-------- -------- ------
cpu 1 (8%) 0 (0%)
memory 140Mi (0%) 340Mi (2%)
ephemeral-storage 0 (0%) 0 (0%)
As lianyouCat mentioned in the comments:正如lianyouCat在评论中提到的:
After installing nvidia-docker2, the default runtime of docker should be modified to nvidia docker as github.com/NVIDIA/k8s-device-plugin#preparing-your-gpu-nodes .
安装 nvidia-docker2 后,需要将 docker 的默认运行时间修改为 nvidia docker github.com/NVIDIA/k8s-device-plugin#preparing-your-gpu-nodes 。
After modifying the
/etc/docker/daemon.json
, you need to restart docker so that the configuration works.修改
/etc/docker/daemon.json
,需要重启/etc/docker/daemon.json
。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.