简体   繁体   English

如何获取在 Kubernetes 节点上运行的 Pod 的历史记录?

[英]How to get history of Pods run on Kubernetes Node?

In our Kuberenetes cluster, we are running into sporadic situations where a cluster node runs out of memory and Linux invokes OOM killer.在我们的 Kuberenetes 集群中,我们偶尔会遇到集群节点内存不足和 Linux 调用 OOM 杀手的情况。 Looking at the logs, it appears that the Pods scheduled onto the Node are requesting more memory than can be allocated by the Node.查看日志,似乎调度到节点上的 Pod 请求的内存多于节点可以分配的内存。

The issue is that, when OOM killer is invoked, it prints out a list of processes and their memory usage.问题是,当调用 OOM 杀手时,它会打印出进程列表及其内存使用情况。 However, as all of our Docker containers are Java services, the "process name" just appears as "java", not allowing us to track down which particular Pod is causing the issues.但是,由于我们所有的 Docker 容器都是 Java 服务,因此“进程名称”仅显示为“java”,无法让我们追踪导致问题的特定 Pod。

How can I get the history of which Pods were scheduled to run on a particular Node and when?如何获取计划在特定节点上运行哪些 Pod 以及何时运行的历史记录?

You can now use kube-state-metrics kube_pod_container_status_terminated_reason to detect OOM events您现在可以使用 kube-state-metrics kube_pod_container_status_terminated_reason来检测 OOM 事件

kube_pod_container_status_terminated_reason{reason="OOMKilled"}

kube_pod_container_status_terminated_reason{container="addon-resizer",endpoint="http-metrics",instance="100.125.128.3:8080",job="kube-state-metrics",namespace="monitoring",pod="kube-state-metrics-569ffcff95-t929d",reason="OOMKilled",service="kube-state-metrics"}

We use Prometheus to monitor OOM events.我们使用Prometheus来监控 OOM 事件。

This expression should report the number of times that memory usage has reached the limits:此表达式应报告内存使用已达到限制的次数:

rate(container_memory_failcnt{pod_name!=""}[5m]) > 0

FYI: this is the next best thing to proper docs, the code仅供参考:这是正确文档的下一个最好的东西, 代码

Event history for your particular namespace, ordered by creationTimestamp:您的特定命名空间的事件历史记录,按creationTimestamp 排序:

kubectl get events -n YOURNAMESPACE -o wide --sort-by=.metadata.creationTimestamp

Or if you want to check the event history for all namespaces, ordered by creationTimestamp:或者,如果您想检查所有命名空间的事件历史记录,按creationTimestamp 排序:

kubectl get events --all-namespaces -o wide --sort-by=.metadata.creationTimestamp

I guess your pods don't have requests and limits set, or the values are not ideal.我猜你的 pod 没有设置请求和限制,或者值不理想。

If you setup this properly, when a pod starts to use too much ram, that pod will be killed and you will be able to find out what is causing the issues.如果设置正确,当 pod 开始使用过多 ram 时,该 pod 将被杀死,您将能够找出导致问题的原因。

About seeing all the pods on a node, you can go with kubectl get events or docker ps -a on the node, as cited on the other answers/comments.关于查看节点上的所有 pod,您可以在节点上使用kubectl get eventskubectl get events docker ps -a ,如其他答案/评论中所述。

一种方法是查看docker ps -a输出并将容器名称与 pod 的容器相关联。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何在每个 pod 中获取所有 Kube.netes pod IP? - How to get all Kubernetes pod IP in each pods? 有没有办法在 Java 中获取活动 Kubernetes Pod 的数量 - Is there a way to get number of active Kubernetes Pods in Java 如何在 Kubernetes 中的同一服务中的 Pod 之间进行通信? - How to communicate between pods in same service in Kubernetes? Kube.netes Pod 的资源监控 - Resource monitoring for Kubernetes Pods Kubernetes中的Multi Container Pod如何终止? - How does termination work in Multi container pods in Kubernetes? 如何使用Java连接到现有的Kubernetes服务器并列出所有Pod? - How to connect to an existing kubernetes server and list all pods using java? 如何获取密码历史记录? - How to get Password history? SharedIndexInformer (Fabric8 kubernetes-client) 在集群中运行时仅监视其自己命名空间的 Pod - SharedIndexInformer (Fabric8 kubernetes-client) watches only pods of its own namespace when run in the cluster Kubernetes pod 使用部署失败,而我的 docker 运行使用代理配置获得成功 - Kubernetes pods failing using deployments, where as my docker run is getting success using proxy configurations 如果 Kubernetes 上存在多个 pod,则从单个 pod 运行 Spring 中的计划 Cron 方法 - Run Scheduled Cron method in Spring from single pod if multiple pods exist on Kubernetes
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM