简体   繁体   English

异构集群中的 Kubernetes cpu 请求/限制

[英]Kubernetes cpu requests/limits in heterogeneous cluster

Kubernetes allows to specify the cpu limit and/or request for a POD. Kubernetes 允许为 POD 指定 CPU 限制和/或请求

Limits and requests for CPU resources are measured in cpu units . CPU 资源的限制和请求以 CPU 为单位进行衡量。 One cpu, in Kubernetes, is equivalent to:在 Kubernetes 中,一个 cpu 相当于:

1 AWS vCPU
1 GCP Core
1 Azure vCore
1 IBM vCPU
1 Hyperthread on a bare-metal Intel processor with Hyperthreading

Unfortunately, when using an heterogeneous cluster (for instance with different processors), the cpu limit/request depends on the node on which the POD is assigned;不幸的是,当使用异构集群(例如使用不同处理器)时, cpu 限制/请求取决于分配 POD 的节点; especially for real time applications.特别是对于实时应用。

If we assume that:如果我们假设:

  • we can compute a fined tuned cpu limit for the POD for each kind of hardware of the cluster我们可以为集群的每种硬件计算 POD 的微调 CPU 限制
  • we want to let the Kubernetes scheduler choose a matching node in the whole cluster我们想让 Kubernetes 调度器在整个集群中选择一个匹配的节点

Is there a way to launch the POD so that the cpu limit/request depends on the node chosen by the Kubernetes scheduler (or a Node label)?有没有办法启动 POD,以便 cpu 限制/请求取决于 Kubernetes 调度程序(或节点标签)选择的节点?

The obtained behavior should be (or equivalent to):获得的行为应该是(或等价于):

  • Before assigning the POD, the scheduler chooses the node by checking different cpu requests depending on the Node (or a Node Label)在分配 POD 之前,调度器通过根据 Node(或 Node Label)检查不同的 cpu 请求来选择节点
  • At runtime, Kublet checks a specific cpu limit depending on the Node (or a Node Label)在运行时,Kublet 根据节点(或节点标签)检查特定的 CPU 限制

No, you can't have different requests per node type.不,每个节点类型不能有不同的请求。 What you can do is create a pod manifest with a node affinity for a specific kind of node, and requests which makes sense for that node type.您可以做的是创建一个 pod 清单,该清单具有特定类型节点的节点亲和性,以及对该节点类型有意义的请求。 For a second kind of node, you will need a second pod manifest which makes sense for that node type.对于第二种节点,您将需要第二个 pod 清单,该清单对该节点类型有意义。 These pod manifests will differ only in their affinity spec and resource requests - so it would be handy to parameterize them.这些 pod 清单仅在它们的亲和性规范和资源请求方面有所不同 - 因此将它们参数化会很方便。 You could do this with Helm, or write a simple script to do it.您可以使用 Helm 执行此操作,或者编写一个简单的脚本来执行此操作。

This approach would let you launch a pod within a subset of your nodes with resource requests which make sense on those nodes, but there's no way to globally adjust its requests/limits based on where it ends up.这种方法可以让你在你的节点子集中启动一个 pod,资源请求在这些节点上有意义,但是没有办法根据它结束的位置全局调整它的请求/限制。

Before assigning the POD, the scheduler chooses the node by checking different cpu requests depending on the Node (or a Node Label)在分配 POD 之前,调度器通过根据 Node(或 Node Label)检查不同的 cpu 请求来选择节点

Not with default scheduler, the closest option is using node-affinity like Marcin suggested, so you can pick the node based on a node label.不是使用默认调度程序,最接近的选项是使用 Marcin 建议的节点关联性,因此您可以根据节点标签选择节点。 Like below:像下面这样:

apiVersion: v1
kind: Pod
metadata:
  name: with-node-affinity
spec:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: kubernetes.io/e2e-az-name
            operator: In
            values:
            - e2e-az1
            - e2e-az2
      preferredDuringSchedulingIgnoredDuringExecution:
      - weight: 1
        preference:
          matchExpressions:
          - key: another-node-label-key
            operator: In
            values:
            - another-node-label-value
  containers:
  - name: podname
    image: k8s.gcr.io/pause:2.0

In this case, you would tag the Nodes with labels to identify their type or purpose, eg: db, cache, web and so on.在这种情况下,您可以用标签来标记节点以识别它们的类型或用途,例如: db、缓存、web等。 Then you set the affinity to match the node types you expect.然后设置关联以匹配您期望的节点类型。

requiredDuringSchedulingIgnoredDuringExecution means the pod won't be scheduled in the node if the conditions are not meet. requiredDuringSchedulingIgnoredDuringExecution表示如果条件不满足,则不会在节点中调度 pod。

preferredDuringSchedulingIgnoredDuringExecution means the scheduler will try to find a node that also matches that condition, but will schedule the pod anywhere possible if no nodes fit the condition specified. preferredDuringSchedulingIgnoredDuringExecution意味着调度程序将尝试找到一个也符合该条件的节点,但如果没有节点符合指定的条件,则会在任何可能的地方调度 pod。

Your other alternative, is writting your custom scheduler.您的另一种选择是编写您的自定义调度程序。

apiVersion: v1
kind: Pod
metadata:
  name: annotation-default-scheduler
  labels:
    name: multischeduler-example
spec:
  schedulerName: default-scheduler
  containers:
  - name: pod-with-default-annotation-container
    image: k8s.gcr.io/pause:2.0

Kubernetes ships with a default scheduler that is described here . Kubernetes 附带了一个默认调度程序,在此处进行了描述。 If the default scheduler does not suit your needs you can implement your own scheduler.如果默认调度程序不适合您的需求,您可以实现自己的调度程序。 This way you can write a complex scheduling logic to decide where each POD should go, only recommended for something that are not possible using the default scheduler这样你就可以编写一个复杂的调度逻辑来决定每个 POD 应该去哪里,只推荐用于使用默认调度器无法实现的事情

Keep in mind, one of the most important components in Kubernetes is the scheduler, the default scheduler is battle tested and really flexible to handle most of the applications.请记住,Kubernetes 中最重要的组件之一是调度程序,默认调度程序经过实战测试,非常灵活,可以处理大多数应用程序。 Writing your own scheduler lose the features provided by the default one, like load balancing, policies, filtering.编写自己的调度程序会失去默认调度程序提供的功能,例如负载平衡、策略、过滤。 To know more about the features provided by default scheduler, check the docs here .要了解有关默认调度程序提供的功能的更多信息,请查看此处的文档。

If you are willing to take the risks and want to write a custom scheduler, take a look in the docs in here .如果您愿意承担风险并希望编写自定义调度程序,请查看此处的文档。

At runtime, Kublet checks a specific cpu limit depending on the Node (or a Node Label)在运行时,Kublet 根据节点(或节点标签)检查特定的 CPU 限制

Before receiving the request to allocate a pod, the scheduler checks for resource availability in the node then assign the pod to a node.在收到分配 Pod 的请求之前,调度程序会检查节点中的资源可用性,然后将 Pod 分配给节点。 Each node have it's own kubelet which check for pods that should initialize in that node and the only thing the kubelet does is start these pods, it does not decide which node a pod it should go.每个节点都有自己的 kubelet,它检查应该在该节点中初始化的 pod,而 kubelet 所做的唯一一件事就是启动这些 pod,它不决定 pod 应该去哪个节点。

Kubelet also check for resources before initializing a POD, In case the kubelet can't initialize the pod it will just fail and the scheduler will take an action to schedule pods elsewhere. Kubelet 还会在初始化 POD 之前检查资源,如果 kubelet 无法初始化 Pod,它将失败并且调度程序将采取措施将 Pod 调度到其他地方。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM