[英]Pod shows as “Terminating” after node is shutdown
There was a pod named n404-neo4j-core-1
running on k8s-slave2.在 k8s-slave2 上运行了一个名为
n404-neo4j-core-1
的 pod。 After k8s-slave2 was turned off, the pod was stuck with the Terminating
.关闭 k8s-slave2 后,pod 被
Terminating
卡住了。
I was expecting the pod to be deleted and a new pod be created on another node.我期待删除 pod 并在另一个节点上创建一个新 pod。 If this problem is not resolved, the neo4j cluster failed to keep HA.
如果这个问题没有解决,neo4j集群无法保持HA。
kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
n404-neo4j-core-0 1/1 Running 0 3d19h *** k8s-node1 <none> <none>
n404-neo4j-core-1 1/1 Terminating 0 78m *** k8s-slave2 <none> <none>
kubectl describe pod n404-neo4j-core-1
Name: n404-neo4j-core-1
Namespace: default
Priority: 0
PriorityClassName: <none>
Node: k8s-slave2/10.176.6.67
Start Time: Mon, 01 Jun 2020 23:53:13 -0700
Labels: app.kubernetes.io/component=core
app.kubernetes.io/instance=n404
app.kubernetes.io/managed-by=Helm
app.kubernetes.io/name=neo4j
controller-revision-hash=n404-neo4j-core-67484bd88
helm.sh/chart=neo4j-4.0.4-1
statefulset.kubernetes.io/pod-name=n404-neo4j-core-1
Annotations: <none>
Status: Terminating (lasts 21m)
Termination Grace Period: 30s
IP: 10.36.0.1
Controlled By: StatefulSet/n404-neo4j-core
Containers:
n404-neo4j:
Container ID: docker://a045d7747678ca62734800d153d01f634b9972b527289541d357cbc27456bf7b
Image: neo4j:4.0.4-enterprise
Image ID: docker-pullable://neo4j@sha256:714d83e56a5db61eb44d65c114720f8cb94b06cd044669e16957aac1bd1b5c34
Ports: 5000/TCP, 7000/TCP, 6000/TCP, 7474/TCP, 7687/TCP, 3637/TCP
Host Ports: 0/TCP, 0/TCP, 0/TCP, 0/TCP, 0/TCP, 0/TCP
Command:
/bin/bash
-c
export core_idx=$(hostname | sed 's|.*-||')
# Processes key configuration elements and exports env vars we need.
. /helm-init/init.sh
# We advertise the discovery-lb addresses (see discovery-lb.yaml) because
# it is for internal cluster comms and is limited to private ports.
export DISCOVERY_HOST="discovery-n404-neo4j-${core_idx}.default.svc.cluster.local"
export NEO4J_causal__clustering_discovery__advertised__address="$DISCOVERY_HOST:5000"
export NEO4J_causal__clustering_transaction__advertised__address="$DISCOVERY_HOST:6000"
export NEO4J_causal__clustering_raft__advertised__address="$DISCOVERY_HOST:7000"
echo "Starting Neo4j CORE $core_idx on $HOST"
exec /docker-entrypoint.sh "neo4j"
State: Running
Started: Mon, 01 Jun 2020 23:53:14 -0700
Ready: True
Restart Count: 0
Liveness: tcp-socket :7687 delay=300s timeout=2s period=10s #success=1 #failure=3
Readiness: tcp-socket :7687 delay=120s timeout=2s period=10s #success=1 #failure=3
Environment Variables from:
n404-neo4j-common-config ConfigMap Optional: false
n404-neo4j-core-config ConfigMap Optional: false
Environment:
NEO4J_SECRETS_PASSWORD: <set to the key 'neo4j-password' in secret 'n404-neo4j-secrets'> Optional: false
Mounts:
/data from datadir (rw)
/helm-init from init-script (rw)
/plugins from plugins (rw)
/var/run/secrets/kubernetes.io/serviceaccount from n404-neo4j-sa-token-jp7g9 (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady True
PodScheduled True
Volumes:
datadir:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: datadir-n404-neo4j-core-1
ReadOnly: false
init-script:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: n404-init-script
Optional: false
plugins:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
n404-neo4j-sa-token-jp7g9:
Type: Secret (a volume populated by a Secret)
SecretName: n404-neo4j-sa-token-jp7g9
Optional: false
QoS Class: BestEffort
Node-Selectors: svc=neo4j
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Kubernetes (versions 1.5 or newer) will not delete Pods just because a Node is unreachable. Kubernetes(1.5 或更高版本)不会因为节点无法访问而删除 Pod。 The Pods running on an unreachable Node enter the 'Terminating' or 'Unknown' state after a timeout.
在无法访问的节点上运行的 Pod 在超时后进入“终止”或“未知”state。 Pods may also enter these states when the user attempts graceful deletion of a Pod on an unreachable Node.
当用户尝试优雅地删除无法访问的节点上的 Pod 时,Pod 也可能进入这些状态。 The only ways in which a Pod in such a state can be removed from the apiserver are as follows:
可以从 apiserver 中删除此类 state 中的 Pod 的唯一方法如下:
The recommended best practice is to use the first or second approach.推荐的最佳实践是使用第一种或第二种方法。 If a Node is confirmed to be dead (eg permanently disconnected from the network, powered down, etc), then delete the Node object.
如果一个节点被确认为死机(例如永久断开网络、断电等),则删除节点 object。 If the Node is suffering from a network partition, then try to resolve this or wait for it to resolve.
如果节点遭受网络分区的困扰,请尝试解决此问题或等待它解决。 When the partition heals, the kubelet will complete the deletion of the Pod and free up its name in the apiserver.
当分区愈合时,kubelet 将完成 Pod 的删除并在 apiserver 中释放其名称。 Normally, the system completes the deletion once the Pod is no longer running on a Node, or the Node is deleted by an administrator.
通常,一旦 Pod 不再在某个 Node 上运行,或者该 Node 被管理员删除,系统就会完成删除。 You may override this by force deleting the Pod.
您可以通过强制删除 Pod 来覆盖它。
You should not down a Kubernetes node at sudden.您不应该突然关闭 Kubernetes 节点。 If you do, you'll end up with some strange scenarios like this.
如果你这样做了,你最终会遇到一些像这样的奇怪场景。
First, cordon the node.首先,封锁节点。 It notifies the scheduler the given node is not available for scheduling anymore.
它通知调度程序给定节点不再可用于调度。
kubectl cordon <node>
Then, drain the node.然后,排空节点。 It moves the running pods to another node/nodes.
它将正在运行的 pod 移动到另一个节点/节点。
kubectl drain <node>
Now, you're safe to remove the node from the cluster.现在,您可以安全地从集群中删除该节点。
This is so called 'at-most-one' semantic in K8S, pls check the link: https://v1-16.docs.kubernetes.io/docs/tasks/run-application/force-delete-stateful-set-pod/这在 K8S 中被称为“最多一个”语义,请检查链接: https://v1-16.docs.kubernetes.io/docs/tasks/run-application/force-delete-stateful-set-荚/
Copied from the link: StatefulSet ensures that, at any time, there is at most one Pod with a given identity running in a cluster.从链接复制:StatefulSet 确保在任何时候,集群中最多有一个具有给定身份的 Pod 运行。 This is referred to as at most one semantics provided by a StatefulSet.
这被称为 StatefulSet 提供的至多一个语义。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.