简体   繁体   English

Kubernetes pod 卡在状态 = 节点进入状态后终止 = NotReady?

[英]Kubernetes pod stuck in state=Terminating after node goes to status = NotReady?

I have 3 node in k8s cluster, where all are masters, ie I have removed the node-role.kubernetes.io/master taint.我在k8s集群中有3个节点,所有节点都是master,即我已经删除了node-role.kubernetes.io/master

I physically removed the network cable on foo2 , so I have我物理上移除了foo2上的网络电缆,所以我有

kubectl get nodes
NAME   STATUS     ROLES    AGE     VERSION
foo1   Ready      master   3d22h   v1.13.5
foo2   NotReady   master   3d22h   v1.13.5
foo3   Ready      master   3d22h   v1.13.5

After several hours some of the pods are still in STATUS = Terminating though I think they should be in Terminated ?几个小时后,一些 pod 仍处于STATUS = Terminating虽然我认为它们应该处于Terminated

I read athttps://www.bluematador.com/docs/troubleshooting/kubernetes-pod我在https://www.bluematador.com/docs/troubleshooting/kubernetes-pod阅读

In rare cases, it is possible for a pod to get stuck in the terminating state.在极少数情况下,吊舱可能会卡在终止的 state 中。 This is detected by finding any pods where every container has been terminated, but the pod is still running.这是通过查找每个容器已终止但 pod 仍在运行的任何 pod 来检测的。 Usually, this is caused when a node in the cluster gets taken out of service abruptly, and the cluster scheduler and controller-manager do not clean up all of the pods on that node.通常,这是由于集群中的一个节点突然停止服务,并且集群调度程序和控制器管理器没有清理该节点上的所有 pod。

Solving this issue is as simple as manually deleting the pod using kubectl delete pod.解决这个问题就像使用 kubectl delete pod 手动删除 pod 一样简单。

The pod describe says if unreachable for 5 minutes will be tolerated...吊舱描述说如果无法到达 5 分钟将被容忍......

Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   True 
  PodScheduled      True 
Volumes:
  etcd-data:
    Type:        EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:      
    SizeLimit:   <unset>
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:          <none>

I have tried kubectl delete pod etcd-lns4g5xkcw which just hung, though forcing it does work as per this answer ...我试过kubectl delete pod etcd-lns4g5xkcw刚刚挂起,尽管强制它确实按照这个答案工作......

kubectl delete pod etcd-lns4g5xkcw  --force=true --grace-period=0
warning: Immediate deletion does not wait for confirmation that the running resource has been terminated. The resource may continue to run on the cluster indefinitely.
pod "etcd-lns4g5xkcw" force deleted

(1) Why is this happening? (1) 为什么会这样? Shouln't it change to terminated?不应该改成终止吗?

(2) Where even is STATUS = Terminating coming from? (2) STATUS = Terminating来自哪里? At https://v1-13.docs.kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/ I See only Waiting/Running/Terminated as the options?https://v1-13.docs.kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/我只看到等待/运行/终止作为选项?

Pods volume and network cleanup can consume more time while in termination status. Pod 卷和网络清理在termination状态时可能会消耗更多时间。 Proper way to do it is to drain node in order to get pods terminated successfully in grace period.正确的方法是排空节点,以便在宽限期内成功终止 pod。 Because you plugged out the network cable the node has changed its status to not ready with pods already running on it.因为您拔掉了网络电缆,所以节点已将其状态更改为not ready ,并且 pod 已经在其上运行。 Due to this pod could not terminate.由于此 pod 无法终止。

You may find this information from k8s documentation about terminating status useful:您可能会从 k8s 文档中找到有关terminating状态的有用信息:

Kubernetes (versions 1.5 or newer) will not delete Pods just because a Node is unreachable. Kubernetes(1.5 或更高版本)不会因为节点无法访问而删除 Pod。 The Pods running on an unreachable Node enter the 'Terminating' or 'Unknown' state after a timeout.在无法访问的节点上运行的 Pod 在超时后进入“终止”或“未知”state。 Pods may also enter these states when the user attempts graceful deletion of a Pod on an unreachable Node:当用户尝试优雅地删除无法访问的节点上的 Pod 时,Pod 也可能进入这些状态:

There are 3 suggested ways to remove it from apiserver:有 3 种建议的方法可以从 apiserver 中删除它:

The Node object is deleted (either by you, or by the Node Controller).节点 object 被删除(由您或由节点控制器)。 The kubelet on the unresponsive Node starts responding, kills the Pod and removes the entry from the apiserver.无响应节点上的 kubelet 开始响应,杀死 Pod 并从 apiserver 中删除条目。 Force deletion of the Pod by the user.用户强制删除 Pod。

Here you can find more information about background deletion from k8s offical documentation在这里你可以从k8s 官方文档中找到更多关于后台删除的信息

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM