[英]How to get history of Pods run on Kubernetes Node?
In our Kuberenetes cluster, we are running into sporadic situations where a cluster node runs out of memory and Linux invokes OOM killer.在我们的 Kuberenetes 集群中,我们偶尔会遇到集群节点内存不足和 Linux 调用 OOM 杀手的情况。 Looking at the logs, it appears that the Pods scheduled onto the Node are requesting more memory than can be allocated by the Node.
查看日志,似乎调度到节点上的 Pod 请求的内存多于节点可以分配的内存。
The issue is that, when OOM killer is invoked, it prints out a list of processes and their memory usage.问题是,当调用 OOM 杀手时,它会打印出进程列表及其内存使用情况。 However, as all of our Docker containers are Java services, the "process name" just appears as "java", not allowing us to track down which particular Pod is causing the issues.
但是,由于我们所有的 Docker 容器都是 Java 服务,因此“进程名称”仅显示为“java”,无法让我们追踪导致问题的特定 Pod。
How can I get the history of which Pods were scheduled to run on a particular Node and when?如何获取计划在特定节点上运行哪些 Pod 以及何时运行的历史记录?
You can now use kube-state-metrics kube_pod_container_status_terminated_reason
to detect OOM events您现在可以使用 kube-state-metrics
kube_pod_container_status_terminated_reason
来检测 OOM 事件
kube_pod_container_status_terminated_reason{reason="OOMKilled"}
kube_pod_container_status_terminated_reason{container="addon-resizer",endpoint="http-metrics",instance="100.125.128.3:8080",job="kube-state-metrics",namespace="monitoring",pod="kube-state-metrics-569ffcff95-t929d",reason="OOMKilled",service="kube-state-metrics"}
We use Prometheus to monitor OOM events.我们使用Prometheus来监控 OOM 事件。
This expression should report the number of times that memory usage has reached the limits:此表达式应报告内存使用已达到限制的次数:
rate(container_memory_failcnt{pod_name!=""}[5m]) > 0
FYI: this is the next best thing to proper docs, the code仅供参考:这是正确文档的下一个最好的东西, 代码
Event history for your particular namespace, ordered by creationTimestamp:您的特定命名空间的事件历史记录,按creationTimestamp 排序:
kubectl get events -n YOURNAMESPACE -o wide --sort-by=.metadata.creationTimestamp
Or if you want to check the event history for all namespaces, ordered by creationTimestamp:或者,如果您想检查所有命名空间的事件历史记录,按creationTimestamp 排序:
kubectl get events --all-namespaces -o wide --sort-by=.metadata.creationTimestamp
I guess your pods don't have requests and limits set, or the values are not ideal.我猜你的 pod 没有设置请求和限制,或者值不理想。
If you setup this properly, when a pod starts to use too much ram, that pod will be killed and you will be able to find out what is causing the issues.如果设置正确,当 pod 开始使用过多 ram 时,该 pod 将被杀死,您将能够找出导致问题的原因。
About seeing all the pods on a node, you can go with kubectl get events
or docker ps -a
on the node, as cited on the other answers/comments.关于查看节点上的所有 pod,您可以在节点上使用
kubectl get events
或kubectl get events
docker ps -a
,如其他答案/评论中所述。
一种方法是查看docker ps -a
输出并将容器名称与 pod 的容器相关联。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.