如何获取在 Kubernetes 节点上运行的 Pod 的历史记录？

Question

In our Kuberenetes cluster, we are running into sporadic situations where a cluster node runs out of memory and Linux invokes OOM killer.在我们的 Kuberenetes 集群中，我们偶尔会遇到集群节点内存不足和 Linux 调用 OOM 杀手的情况。 Looking at the logs, it appears that the Pods scheduled onto the Node are requesting more memory than can be allocated by the Node.查看日志，似乎调度到节点上的 Pod 请求的内存多于节点可以分配的内存。

The issue is that, when OOM killer is invoked, it prints out a list of processes and their memory usage.问题是，当调用 OOM 杀手时，它会打印出进程列表及其内存使用情况。 However, as all of our Docker containers are Java services, the "process name" just appears as "java", not allowing us to track down which particular Pod is causing the issues.但是，由于我们所有的 Docker 容器都是 Java 服务，因此“进程名称”仅显示为“java”，无法让我们追踪导致问题的特定 Pod。

How can I get the history of which Pods were scheduled to run on a particular Node and when?如何获取计划在特定节点上运行哪些 Pod 以及何时运行的历史记录？

Answer 1

You can now use kube-state-metrics kube_pod_container_status_terminated_reason to detect OOM events您现在可以使用 kube-state-metrics kube_pod_container_status_terminated_reason来检测 OOM 事件

kube_pod_container_status_terminated_reason{reason="OOMKilled"}

kube_pod_container_status_terminated_reason{container="addon-resizer",endpoint="http-metrics",instance="100.125.128.3:8080",job="kube-state-metrics",namespace="monitoring",pod="kube-state-metrics-569ffcff95-t929d",reason="OOMKilled",service="kube-state-metrics"}

Answer 2

We use Prometheus to monitor OOM events.我们使用Prometheus来监控 OOM 事件。

This expression should report the number of times that memory usage has reached the limits:此表达式应报告内存使用已达到限制的次数：

rate(container_memory_failcnt{pod_name!=""}[5m]) > 0

FYI: this is the next best thing to proper docs, the code仅供参考：这是正确文档的下一个最好的东西，代码

Answer 3

Event history for your particular namespace, ordered by creationTimestamp:您的特定命名空间的事件历史记录，按creationTimestamp 排序：

kubectl get events -n YOURNAMESPACE -o wide --sort-by=.metadata.creationTimestamp

Or if you want to check the event history for all namespaces, ordered by creationTimestamp:或者，如果您想检查所有命名空间的事件历史记录，按creationTimestamp 排序：

kubectl get events --all-namespaces -o wide --sort-by=.metadata.creationTimestamp

Answer 4

I guess your pods don't have requests and limits set, or the values are not ideal.我猜你的 pod 没有设置请求和限制，或者值不理想。

If you setup this properly, when a pod starts to use too much ram, that pod will be killed and you will be able to find out what is causing the issues.如果设置正确，当 pod 开始使用过多 ram 时，该 pod 将被杀死，您将能够找出导致问题的原因。

About seeing all the pods on a node, you can go with kubectl get events or docker ps -a on the node, as cited on the other answers/comments.关于查看节点上的所有 pod，您可以在节点上使用kubectl get events或kubectl get events docker ps -a ，如其他答案/评论中所述。

Answer 5

一种方法是查看docker ps -a输出并将容器名称与 pod 的容器相关联。

如何获取在 Kubernetes 节点上运行的 Pod 的历史记录？

问题描述

5 个解决方案

解决方案1
4 2018-05-28 09:27:30

解决方案2
2 已采纳 2018-01-03 21:56:17

解决方案3
1 2020-12-22 19:21:50

解决方案4
0 2017-05-26 18:47:11

解决方案5
-2 2016-09-18 18:21:44

如何获取在 Kubernetes 节点上运行的 Pod 的历史记录？

问题描述

5 个解决方案

解决方案1 4 2018-05-28 09:27:30

解决方案2 2 已采纳 2018-01-03 21:56:17

解决方案3 1 2020-12-22 19:21:50

解决方案4 0 2017-05-26 18:47:11

解决方案5 -2 2016-09-18 18:21:44

解决方案1
4 2018-05-28 09:27:30

解决方案2
2 已采纳 2018-01-03 21:56:17

解决方案3
1 2020-12-22 19:21:50

解决方案4
0 2017-05-26 18:47:11

解决方案5
-2 2016-09-18 18:21:44