简体繁体 English

Kubernetes 中 Pod 保证的 QoS 等级

[英]QoS class of Guaranteed for Pod in Kubernetes

原文 2021-11-12 22:08:32 0 1 kubernetes

On my kubernetes nodes there are在我的 kubernetes 节点上有

prioritized pods优先 Pod
dispensable pods可有可无的豆荚

Therefore I would like to have QoS class of Guaranteed for the prioritized pods.因此，我希望优先级 pod 的 QoS 等级为Guaranteed 。 To achieve a Guaranteed class the cpu/memory requests/limits must meet some conditions .要实现有Guaranteed类， cpu/内存请求/限制必须满足某些条件。 Therefore:所以：

For every Container in the Pod, the CPU limit must equal the CPU request对于 Pod 中的每个 Container，CPU 限制必须等于 CPU 请求

But I would like to set a higher CPU limit than request, so that the prioritized pods can use every free CPU resources which are available.但是我想设置比请求更高的 CPU 限制，以便优先级 pod 可以使用每个可用的空闲 CPU 资源。

Simple example: A Node with 4 cores has:简单示例：具有 4 个内核的节点具有：

1 prioritized pod with 2000 CPU request and 3900 CPU limit 1 个具有 2000 个 CPU 请求和 3900 个 CPU 限制的优先 pod
3 dispensable pods with each 500 CPU request and limit. 3 个可有可无的 Pod，每 500 个 CPU 请求和限制。

If the prioritized pod would have 2000 CPU request and limit 2 Cores are wasted because the dispensable pods don't use CPU most of the time.如果优先级 Pod 将有 2000 个 CPU 请求并限制 2 个内核被浪费，因为可有可无的 Pod 大部分时间都不使用 CPU。

If the prioritized pod would have 3900 CPU request and limit, I would need an extra node for the dispensable pods.如果优先 pod 有 3900 个 CPU 请求和限制，我将需要一个额外的节点来处理可有可无的 pod。

Questions问题

Is it possible to set explicitly the Guaranteed class to a pod even with difference CPU request and limit?即使 CPU 请求和限制不同，是否可以将Guaranteed类显式设置为 pod？

If it's not possible: Why is there no way to explicitly set the QoS class?如果不可能：为什么没有办法明确设置 QoS 类？

Remarks评论

There's an system-cluster-critical option.有一个系统集群关键选项。 But I think this should only be used for critical k8s add-on pods but not for critical applications.但我认为这应该只用于关键的 k8s 附加 pod，而不是关键的应用程序。

1 个解决方案

Is it possible to set explicitly the Guaranteed class to a pod even with difference CPU request and limit?即使 CPU 请求和限制不同，是否可以将Guaranteed类显式设置为 pod？

Yes, however you will need to use an additional plugin: capacity-scheduling used with PriorityClass :是的，但是您需要使用一个额外的插件：与PriorityClass一起使用的容量调度：

There is increasing demand to use Kubernetes to manage batch workloads (ML/DL).使用 Kubernetes 来管理批处理工作负载 (ML/DL) 的需求不断增加。 In those cases, one challenge is to improve cluster utilization while ensuring that each user has a reasonable amount of resources.在这些情况下，一个挑战是提高集群利用率，同时确保每个用户拥有合理数量的资源。 The problem can be partially addressed by the Kubernetes ResourceQuota . Kubernetes ResourceQuota可以部分解决这个问题。 The native Kubernetes ResourceQuota API can be used to specify the maximum overall resource allocation per namespace.本机 Kubernetes ResourceQuota API 可用于指定每个命名空间的最大总体资源分配。 The quota enforcement is done through an admission check.配额执行是通过准入检查完成的。 A quota consumer (eg, a Pod) cannot be created if the aggregated resource allocation exceeds the quota limit.如果聚合资源分配超过配额限制，则无法创建配额使用者（例如 Pod）。 In other words, the overall resource usage is aggregated based on Pod's spec (ie, cpu/mem requests) when it's created.换句话说，总体资源使用情况是根据 Pod 的规范（即 cpu/mem 请求）在创建时聚合的。 The Kubernetes quota design has the limitation: the quota resource usage is aggregated based on the resource configurations (eg, Pod cpu/mem requests specified in the Pod spec). Kubernetes 配额设计有一个限制：配额资源使用情况是根据资源配置（例如 Pod 规范中指定的 Pod cpu/mem 请求）聚合的。 Although this mechanism can guarantee that the actual resource consumption will never exceed the ResourceQuota limit, it might lead to low resource utilization as some pods may have claimed the resources but failed to be scheduled.这种机制虽然可以保证实际的资源消耗永远不会超过ResourceQuota的限制，但可能会导致资源利用率低，因为一些pod可能已经申请了资源但没有被调度。 For instance, actual resource consumption may be much smaller than the limit.例如，实际资源消耗可能远小于限制。

Pods can be created at a specific priority .可以按特定优先级创建 Pod。 You can control a pod's consumption of system resources based on a pod's priority, by using the scopeSelector field in the quota spec.通过使用配额规范中的scopeSelector字段，您可以根据 Pod 的优先级控制 Pod 对系统资源的消耗。

A quota is matched and consumed only if scopeSelector in the quota spec selects the pod.仅当配额规范中的scopeSelector选择 pod 时，才会匹配和消耗配额。

When quota is scoped for priority class using scopeSelector field, quota object is restricted to track only following resources:当使用scopeSelector字段为优先级类scopeSelector配额时，配额对象仅限于跟踪以下资源：

pods

cpu

memory

ephemeral-storage

limits.cpu

limits.memory

limits.ephemeral-storage

requests.cpu

requests.memory

requests.ephemeral-storage

This plugin supports also preemption (example for Elastic):这个插件也支持抢占（例如 Elastic）：

Preemption happens when a pod is unschedulable, ie, failed in PreFilter or Filter phases.抢占发生在 pod 不可调度时，即在 PreFilter 或 Filter 阶段失败。

In particular for capacity scheduling, the failure reasons could be:特别是对于容量调度，失败的原因可能是：

Prefilter Stage预过滤阶段

sum(allocated res of pods in the same elasticquota) + pod.request > elasticquota.spec.max sum（在同一个 elasticquota 中分配的 pod 资源）+ pod.request > elasticquota.spec.max

sum(allocated res of pods in the same elasticquota) + pod.request > sum(elasticquota.spec.min) sum(在同一个 elasticquota 中分配的 pod 资源) + pod.request > sum(elasticquota.spec.min)

So the preemption logic will attempt to make the pod schedulable, with a cost of preempting other running pods.因此，抢占逻辑将尝试使 pod 可调度，代价是抢占其他正在运行的 pod。