简体繁体 English

Kubernetes中基于资源利用的活动检查

[英]Resource-utilization based liveness checks in Kubernetes

原文 2019-01-04 04:05:38 3 1 kubernetes

In Kubernetes, we have a liveness probe which periodically checks whether the container is accessible and, kills and spawns a new one otherwise. 在Kubernetes中，我们有一个活动探测器，该探测器会定期检查容器是否可访问，是否杀死并产生一个新容器。

We have a Java webapp and in most of the cases, I see that the application becomes unavailable due to memory pressure. 我们有一个Java webapp，在大多数情况下，我发现由于内存压力，该应用程序变得不可用。 We have a liveness probe, but since the health check service call doesn't take much memory, it succeeds even though a lot of other requests which require more memory linger on. 我们有一个活动探针，但是由于运行状况检查服务调用不会占用太多内存，因此即使许多其他需要更多内存的请求仍然存在，它也会成功。

The GC keeps on running continuously to reclaim the memory but to no avail. GC继续连续运行以回收内存，但无济于事。 The instance never recovers. 该实例永不恢复。 In such a state, I would like Kubernetes to kill the pod, but given that liveness probe still succeeds, it doesn't. 在这种状态下，我希望Kubernetes杀死Pod，但是鉴于活动性探测仍然成功，所以不会成功。 One way to handle this could be to make liveness probe a more resource intensive operation, but then, it would consume more cycles and put additional load on the system. 解决此问题的一种方法可能是使活动性探针的资源消耗更多，但是这样会消耗更多的周期，并给系统带来额外的负载。

So, I would like to have some kind of a liveness check which monitors the slope of the graph of Garbage collection counts of the Java process. 因此，我想进行某种活动检查，以监视Java进程的垃圾回收计数图的斜率。 Another way to state the same is that I want my liveness probe to depend upon telemetry data. 另一种说法是，我希望我的活动探针依赖遥测数据。 Is there any way to achieve that? 有什么办法可以实现？

1 个解决方案

The health probes are often used in the form of HTTP requests that check the status code returned by the HTTP endpoint. 健康状况探测通常以HTTP请求的形式使用，该请求检查HTTP端点返回的状态代码。 However, you can also execute scripts as health checks and the kubernetes documentation provides an example which does a cat on a file . 但是，您也可以执行脚本作为运行状况检查，并且kubernetes文档提供了一个在文件上做些事的示例。 Instead of doing a cat on a file, you could run a custom script command to check the stat you want (eg java heap size ). 您可以运行自定义脚本命令来检查所需的统计信息（例如， java堆大小），而不是对文件进行处理。 If the script is complex maybe you'd want to include that script in your image or mount it into the container from a configmap. 如果脚本很复杂，则可能需要将该脚本包含在映像中，或者将其从configmap安装到容器中。 There will be other ways to get metrics other than running bash commands as you could go to the k8s metrics API . 除了运行bash命令外，还有其他获取指标的方法，您可以使用k8s 指标API 。 Or you could get your java app to report directly with a rest endpoint that you can call to (eg something like spring boot actuator). 或者，您可以让您的Java应用程序直接报告您可以调用的其余端点（例如，诸如Spring Boot致动器）。