如何正确 label 并配置 Kubernetes 以使用 Nvidia GPU？

Question

I have an in house K8s cluster running on bare metal.我有一个在裸机上运行的内部 K8s 集群。 On one of my worker nodes I have 4 GPUs and I want to configure K8s to recognise and use these GPUs.在我的一个工作节点上，我有 4 个 GPU，我想配置 K8s 以识别和使用这些 GPU。 Based on the official documentation I installed all the required stuff and now when I run:根据官方文档，我安装了所有必需的东西，现在当我运行时：

docker run --runtime=nvidia --rm nvidia/cuda nvidia-smi


Tue Nov 12 09:20:20 2019       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.67       Driver Version: 418.67       CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce RTX 208...  On   | 00000000:02:00.0 Off |                  N/A |
| 29%   25C    P8     2W / 250W |      0MiB / 10989MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  GeForce RTX 208...  On   | 00000000:03:00.0 Off |                  N/A |
| 29%   25C    P8     1W / 250W |      0MiB / 10989MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   2  GeForce RTX 208...  On   | 00000000:82:00.0 Off |                  N/A |
| 29%   26C    P8     2W / 250W |      0MiB / 10989MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   3  GeForce RTX 208...  On   | 00000000:83:00.0 Off |                  N/A |
| 29%   26C    P8    12W / 250W |      0MiB / 10989MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

I know that I have to label the node so K8s recognise these GPUs but I can't find the correct labels on the official documentations.我知道我必须对节点进行 label 以便 K8s 识别这些 GPU，但我在官方文档中找不到正确的标签。 On the docs I just see this:在文档上，我只看到了这个：

# Label your nodes with the accelerator type they have.
kubectl label nodes <node-with-k80> accelerator=nvidia-tesla-k80

While on another tutorial (just for google cloude) I found this:在另一个教程（仅适用于 google cloude）中，我发现了这一点：

aliyun.accelerator/nvidia_count=1                          #This field is important.
aliyun.accelerator/nvidia_mem=12209MiB
aliyun.accelerator/nvidia_name=Tesla-M40

So what is the proper way to label my node?那么 label 我的节点的正确方法是什么？ Do I need to also label it with the number and memory size of GPUs?我是否还需要 label 以及 GPU 的数量和 memory 大小？

Answer 1

I see you are trying to make sure that your pod gets scheduled on a node with GPUs我看到您正在尝试确保您的 pod 被安排在具有 GPU 的节点上

The easiest way to do it would be to label a node with GPU like this:最简单的方法是 label 一个带有 GPU 的节点，如下所示：

kubectl label node <node_name> has_gpu=true

and then creating your pod add nodeSelector fied with has_gpu: true .然后用has_gpu: true创建你的 pod 添加nodeSelector 。 In this way pod will be scheduled only on nodes with GPUs.这样，pod 将仅在具有 GPU 的节点上调度。 Read more here in k8s docs 在 k8s 文档中阅读更多内容

The only problem with it is that in this case scheduler is not aware of how many GPUs are on the node and can schedule more than 4 pods on the node with only 4 GPUs.唯一的问题是，在这种情况下，调度程序不知道节点上有多少 GPU，并且可以在只有 4 个 GPU 的节点上调度超过 4 个 Pod。

Better option would be to use node extended resource更好的选择是使用节点扩展资源

It would look like follows:它如下所示：

run kubectl proxy运行kubectl proxy

patch node resource configuration : 补丁节点资源配置：

 curl --header "Content-Type: application/json-patch+json" \ --request PATCH \ --data '[{"op": "add", "path": "/status/capacity/example.com~1gpu", "value": "4"}]' \ http://localhost:8001/api/v1/nodes/<your-node-name>/status

assign an extender resource to a pod 将扩展器资源分配给 pod

 apiVersion: v1 kind: Pod metadata: name: extended-resource-demo spec: containers: - name: extended-resource-demo-ctr image: my_pod_name resources: requests: example.com/gpu: 1 limits: example.com/gpu: 1

In this case scheduler is aware how many GPUs are available on the node and won't schedule more pods if cannot satisfy requests.在这种情况下，调度程序知道节点上有多少 GPU 可用，如果不能满足请求，则不会调度更多的 pod。

如何正确 label 并配置 Kubernetes 以使用 Nvidia GPU？

问题描述

1 个解决方案

解决方案1
0 已采纳 2019-11-12 15:14:11

如何正确 label 并配置 Kubernetes 以使用 Nvidia GPU？

问题描述

1 个解决方案

解决方案1 0 已采纳 2019-11-12 15:14:11

解决方案1
0 已采纳 2019-11-12 15:14:11