Kubelet SyncLoop停止（v1.1.1）

Question

I am trying to resize a replication controller from 2 to 0, the two pods that are to be deleted are scheduled on node1 and node2 respectively. 我正在尝试将复制控制器的大小从2调整为0，要删除的两个Pod分别安排在node1和node2上。 The pod on node2 gets deleted without a problem, but the one on node1 stays active and running according to both kubectl get pods and docker ps 可以毫无问题地删除node2上的pod，但是node1上的pod保持活动状态并根据kubectl get pods和kubectl get pods docker ps

symptoms: 症状：

kubectl scale rc my-app-v1 --replicas=0
kubectl get rc my-app-v1
# waited several minutes
kubectl get pods -l app=my-app

output: 输出：

CONTROLLER   CONTAINER(S)   IMAGE(S)           SELECTOR     REPLICAS
my-app-v1    my-app         <docker image>     <selector>   0

NAME                 READY     STATUS    RESTARTS   AGE  NODE
my-app-v1-a12da      1/1       Running   0          5d   node1

One of the two pods got deleted properly, while the other remains running. 两个吊舱之一已正确删除，而另一个吊舱仍在运行。 I have tried this several times and have consistently had problems with only node1. 我已经尝试了几次，并且一直只使用node1遇到问题。

What I have tried to fix it: 我试图解决的问题：

I ssh'ed into node1 and restarted kubelet. 我将其切入node1并重新启动kubelet。 This deleted the pod that was lingering around, but When I try to delete another pod on that node I still have to restart kubelet to get it to work. 这删除了一直徘徊的Pod，但是当我尝试删除该节点上的另一个Pod时，我仍然必须重新启动kubelet才能使其正常工作。

I think the kubelet loop is stuck somewhere and only makes it through a few iterations before getting stuck. 我认为kubelet循环卡在某个地方，并且只有经过几次迭代才能卡住。

I just turned on verbose logging, but I'm not sure what I should look for. 我只是打开了详细的日志记录，但是我不确定应该寻找什么。

Update 更新资料

This also applies to containers scheduled to node1. 这也适用于调度到节点1的容器。 Their images are never pulled, nor are they started. 他们的图像永远不会被拉动，也不会开始。

node1 has worked in the past and I just started running into this problem last night node1过去工作过，昨晚我刚开始遇到此问题

Kubelet Version Kubelet版本

admin@node1 ~ $ /opt/bin/kubelet --version=true
Kubernetes v1.1.1

Kubectl version Kubectl版本

kubectl version
Client Version: version.Info{Major:"1", Minor:"0", GitVersion:"v1.0.6", GitCommit:"388061f00f0d9e4d641f9ed4971c775e1654579d", GitTreeState:"clean"}
Server Version: version.Info{Major:"1", Minor:"1", GitVersion:"v1.1.1", GitCommit:"92635e23dfafb2ddc828c8ac6c03c7a7205a84d8", GitTreeState:"clean"}

Log Excerpts: where is SyncLoop? 日志摘录：SyncLoop在哪里？

8154 config.go:382] Receiving a new pod "my-app-v1-a12da_default"
...
8154 server.go:944] GET /stats/default/my-app-v1-a12da/<some uuid>/app-container: (75.513µs) 404 [[Go 1.1 package http]

Normally the SyncLoop would pick this up and do the necessary docker operations to get the container started. 通常，SyncLoop会接管并执行必要的docker操作以启动容器。 But there is no sync loop activity after "Receiving new pod" and there hasn't been for the past 50 minutes since I restarted kubelet. 但是“接收新的Pod”之后没有同步循环活动，并且自从我重新启动kubelet以来已经有50分钟没有了。

Answer 1

As pointed out by @yu-ju-hong, this was due to a bug in Kubernetes 1.1.1 in handling version-skewed clusters. 正如@ yu-ju-hong所指出的，这是由于Kubernetes 1.1.1中存在一个处理版本倾斜集群的错误。 Please upgrade the master to a newer version, such as Kubernetes 1.1.7, and, ideally, upgrade the nodes to the same version as soon as possible. 请将主节点升级到较新的版本，例如Kubernetes 1.1.7，理想情况下，请尽快将节点升级到相同版本。

Kubelet SyncLoop停止（v1.1.1）

问题描述

symptoms: 症状：

output: 输出：

What I have tried to fix it: 我试图解决的问题：

Update 更新资料

Kubelet Version Kubelet版本

Kubectl version Kubectl版本

Log Excerpts: where is SyncLoop? 日志摘录：SyncLoop在哪里？

1 个解决方案

解决方案1
0 2016-02-02 02:29:20

Kubelet SyncLoop停止（v1.1.1）

问题描述

symptoms: 症状：

output: 输出：

What I have tried to fix it: 我试图解决的问题：

Update 更新资料

Kubelet Version Kubelet版本

Kubectl version Kubectl版本

Log Excerpts: where is SyncLoop? 日志摘录：SyncLoop在哪里？

1 个解决方案

解决方案1 0 2016-02-02 02:29:20

解决方案1
0 2016-02-02 02:29:20