简体   繁体   English

kubernetes:如何在 memory 限制阈值上重启 pod

[英]kubernetes: how to restart pod on memory limit threshold

I have a deployment with memory limits我有一个 memory 限制的部署

resources:
  limits:
    memory: 128Mi

But my app starts to fail when it is near to the limit, so, there is any way to restart the pod before it reaches a percentage of the memory limit?但是我的应用程序在接近限制时开始失败,所以,有什么方法可以在 pod 达到 memory 限制的百分比之前重新启动它?

For example if the limit is 128Mi, restart the pod when it reach 85% of it.例如,如果限制为 128Mi,则在达到 85% 时重新启动 pod。

I am going to address this question from the Kuberentes side.我将从 Kuberentes 方面解决这个问题。

As already mentioned by arjain13, the solution you thought of is not the way to go as it is against the idea of Requests and limits :正如 arjain13 已经提到的,您想到的解决方案不是通往 go 的方法,因为它违反了Requests and limits的想法:

If you set a memory limit of 4GiB for that Container, the kubelet (and container runtime) enforce the limit.如果您为该容器设置了 4GiB 的 memory 限制,则 kubelet(和容器运行时)会强制执行该限制。 The runtime prevents the container from using more than the configured resource limit.运行时防止容器使用超过配置的资源限制。 For example: when a process in the container tries to consume more than the allowed amount of memory, the system kernel terminates the process that attempted the allocation, with an out of memory (OOM) error.例如:当容器中的进程尝试消耗超过允许的 memory 数量时,系统 kernel 终止尝试分配的进程,并出现 out of ZCD69B4957F06CD818D7BFOO3D61980E291 错误。

You can also find an example of Exceeding a Container's memory limit :您还可以找到超出容器的 memory 限制的示例

A Container can exceed its memory request if the Node has memory available.如果节点有 memory 可用,则容器可以超过其 memory 请求。 But a Container is not allowed to use more than its memory limit.但是一个 Container 不能使用超过它的 memory 限制。 If a Container allocates more memory than its limit, the Container becomes a candidate for termination.如果 Container 分配的 memory 超过其限制,则该 Container 成为终止的候选者。 If the Container continues to consume memory beyond its limit, the Container is terminated.如果 Container 继续消耗 memory 超出其限制,则 Container 被终止。 If a terminated Container can be restarted, the kubelet restarts it, as with any other type of runtime failure.如果可以重新启动终止的容器,则 kubelet 会重新启动它,就像任何其他类型的运行时故障一样。

There are two things I would like to recommend you to try in your current use case:我想建议您在当前的用例中尝试两件事:

  1. Debug your application in order to eliminate the memory leak which looks like to be the source of this issue.调试您的应用程序以消除 memory 泄漏,这似乎是此问题的根源。

  2. Use a livenessProbe :使用livenessProbe

Indicates whether the container is running.指示容器是否正在运行。 If the liveness probe fails, the kubelet kills the container, and the container is subjected to its restart policy.如果 liveness 探测失败,kubelet 会杀死容器,容器会受到其重启策略的约束。

It can be configured using the fields below:可以使用以下字段进行配置:

  • initialDelaySeconds : Number of seconds after the container has started before liveness or readiness probes are initiated. initialDelaySeconds :容器启动后,在启动活动或就绪探测之前的秒数。 Defaults to 0 seconds.默认为 0 秒。 Minimum value is 0.最小值为 0。

  • periodSeconds : How often (in seconds) to perform the probe. periodSeconds :执行探测的频率(以秒为单位)。 Default to 10 seconds.默认为 10 秒。 Minimum value is 1.最小值为 1。

  • timeoutSeconds : Number of seconds after which the probe times out. timeoutSeconds :探测超时的秒数。 Defaults to 1 second.默认为 1 秒。 Minimum value is 1.最小值为 1。

  • successThreshold : Minimum consecutive successes for the probe to be considered successful after having failed. successThreshold :探测失败后被视为成功的最小连续成功。 Defaults to 1. Must be 1 for liveness.默认为 1。活性必须为 1。 Minimum value is 1.最小值为 1。

  • failureThreshold : When a probe fails, Kubernetes will try failureThreshold times before giving up. failureThreshold :当探测失败时,Kubernetes 将在放弃之前尝试failureThreshold次。 Giving up in case of liveness probe means restarting the container.在 liveness probe 的情况下放弃意味着重新启动容器。 In case of readiness probe the Pod will be marked Unready.在就绪探测的情况下,Pod 将被标记为未就绪。 Defaults to 3. Minimum value is 1.默认为 3。最小值为 1。

If you set the minimal values for periodSeconds , timeoutSeconds , successThreshold and failureThreshold you can expect more frequent checks and faster restarts.如果您为periodSecondstimeoutSecondssuccessThresholdfailureThreshold设置最小值,您可以期待更频繁的检查和更快的重启。

Below you will find some useful sources and guides:您将在下面找到一些有用的资源和指南:

You cannot do that using resources within the pods as it defeats the purpose of limits .您不能使用 pod 中的resources来执行此操作,因为它违背了limits的目的。 Rather you can set up the horizontalpodautoscaler that will spin a new pod whenever it reaches to any threshold in terms of CPU and memory..相反,您可以设置horizontalpodautoscaler ntalpodautoscaler,当它达到任何 CPU 和 memory 的阈值时,它将旋转一个新的 pod。

Link to set-up the hpa can be referred here with some examples here可以在此处参考设置 hpa 的链接,并在此处提供一些示例

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM