简体繁体 English

Kubernetes容器/容器正在运行但未与“ kubectl get容器”一起列出？

[英]Kubernetes pod/containers running but not listed with 'kubectl get pods'?

原文 2018-02-28 20:12:38 3 1 kubernetes/ kubectl/ telegraf

I have an issue that, at face value, appears to indicate that I have two deployments running in parallel within my kube cluster, but 'kubectl get pods' only shows one deployment. 从表面上看，我有一个问题似乎表明我在kube集群中并行运行了两个部署，但是“ kubectl get pods”仅显示一个部署。

My deployment is composed of a pod with two containers. 我的部署由一个带有两个容器的容器组成。 One of the containers runs a golang application that creates an http API endpoint, and the other runs Telegraf to read metrics from the API endpoint and push them to InfluxDB. 其中一个容器运行一个golang应用程序，该应用程序创建一个http API端点，另一个容器运行Telegraf以从API端点读取指标并将其推送到InfluxDB。 When writing the data to Influx I tag the data with the source host as the name of the pod. 当将数据写入Influx时，我使用源主机作为Pod的名称标记数据。 I use Grafana to plot the metrics and I can clearly see incoming streaming data coming from two hosts (eg I can set a "WHERE host=" query clause as either "application-pod-name-231620957-7n32f" and "application-pod-name-1931165991-x154c"). 我使用Grafana绘制指标，并且可以清楚地看到来自两个主机的传入流数据（例如，我可以将“ WHERE host =“查询子句设置为” application-pod-name-231620957-7n32f“和” application-pod -name-1931165991-x154c“）。

Based on the above, I'm fairly certain that two deployments of the pod are running, each with the two containers (one providing application metrics and the other with telegraf sending metrics to InfluxDB). 基于以上所述，我可以确定Pod的两个部署正在运行，每个都有两个容器（一个提供应用程序指标，另一个通过Telegraf向InfluxDB发送指标）。

However, kube seems to think that one of the deployments doesn't exist. 但是，kube似乎认为其中一种部署不存在。 As mentioned, "kubectl get pods" doesn't display the 2nd pod name in any way shape or form. 如前所述，“ kubectl get pods”不会以任何形式显示第二个pod名称。 Only one of them. 只有其中之一。

Has anyone seen this? 有人看过吗？ Any ideas on further troubleshooting? 关于进一步的故障排除有什么想法吗？ I've attempted to use the pod name (that I have within telegraf) to query more information using kubectl but always get the response that the pod doesn't exist... but it must exist! 我尝试使用pod名称（我在telegraf中有此名称）来使用kubectl查询更多信息，但始终会得到该pod不存在的响应……但它必须存在！ It's sending live data! 它正在发送实时数据！

1 个解决方案

We had been experiencing issues with a node within the cluster. 我们一直在集群中的一个节点遇到问题。 Specifically, the node was experiencing GC failures and communications into the cluster from that node was broken. 具体来说，该节点遇到GC故障，并且从该节点到群集的通信中断。 Due to these failures, someone on our team performed a 'kubectl delete' on the node from within the cluster. 由于这些故障，我们团队中的某人在集群中的节点上执行了“ kubectl删除”。 By doing so the node continued running, but also the kubelet running on the node remained in a broken state, and so the node couldn't re-auto-register itself into the cluster. 这样，节点继续运行，但是在节点上运行的kubelet仍处于损坏状态，因此该节点无法将自身重新注册到集群中。 This node happened to be running the 2nd pod, and the pods running on the node continued running without issue. 该节点碰巧正在运行第二个Pod，并且在该节点上运行的Pod继续运行而没有问题。 In our case, the node was running on AWS, in which case the way to avoid this situation is to reboot the node either from the AWS console or AWS API. 在我们的示例中，该节点在AWS上运行，在这种情况下，避免这种情况的方法是从AWS控制台或AWS API重新启动该节点。