[英]Kops rolling-update fails with “Cluster did not pass validation” for master node
For some reason my master node can no longer connect to my cluster after upgrading from kubernetes 1.11.9 to 1.12.9 via kops (version 1.13.0). 由于某种原因,在通过kops(版本1.13.0)从kubernetes 1.11.9升级到1.12.9之后,我的主节点无法再连接到群集。 In the manifest I'm upgrading
kubernetesVersion
from 1.11.9 -> 1.12.9. 在清单中,我升级
kubernetesVersion
> 1.12.9 -从1.11.9。 This is the only change I'm making. 这是我唯一要做的更改。 However when I run
kops rolling-update cluster --yes
I get the following error: 但是,当我运行
kops rolling-update cluster --yes
是时,出现以下错误:
Cluster did not pass validation, will try again in "30s" until duration "5m0s" expires: machine "i-01234567" has not yet joined cluster.
Cluster did not validate within 5m0s
After that if I run a kubectl get nodes
I no longer see that master node in my cluster. 之后,如果我运行
kubectl get nodes
我将不再在群集中看到该主节点。
Doing a little bit of debugging by sshing into the disconnected master node instance I found the following error in my api-server log by running sudo cat /var/log/kube-apiserver.log
: 通过切入断开连接的主节点实例进行一些调试,我通过运行
sudo cat /var/log/kube-apiserver.log
在api服务器日志中发现了以下错误:
controller.go:135] Unable to perform initial IP allocation check: unable to refresh the service IP block: client: etcd cluster is unavailable or misconfigured; error #0: dial tcp 127.0.0.1:4001: connect: connection refused
I suspect the issue might be related to etcd, because when I run sudo netstat -nap | grep LISTEN | grep etcd
我怀疑问题可能与etcd有关,因为当我运行
sudo netstat -nap | grep LISTEN | grep etcd
sudo netstat -nap | grep LISTEN | grep etcd
sudo netstat -nap | grep LISTEN | grep etcd
there is no output. sudo netstat -nap | grep LISTEN | grep etcd
没有输出。
Anyone have any idea how I can get my master node back in the cluster or have advice on things to try? 任何人都知道如何将主节点重新加入群集或对尝试的事情有建议吗?
I have made some research I got few ideas for you: 我进行了一些研究,但对您的想法却很少:
If there is no output for the etcd grep it means that your etcd server is down. 如果etcd grep没有输出,则意味着您的etcd服务器已关闭。 Check the logs for the 'Exited' etcd container
| grep Exited | grep etcd
检查“已退出” etcd容器的日志
| grep Exited | grep etcd
| grep Exited | grep etcd
| grep Exited | grep etcd
and than logs <etcd-container-id>
| grep Exited | grep etcd
然后logs <etcd-container-id>
Try this instruction I found: 尝试我发现的以下指令 :
1 - I removed the old master from de etcd cluster using etcdctl.
1-我使用etcdctl从de etcd集群中删除了旧的master。 You will need to connect on the etcd-server container to do this.
您将需要在etcd-server容器上进行连接。
2 - On the new master node I stopped kubelet and protokube services.
2-在新的主节点上,我停止了kubelet和protokube服务。
3 - Empty Etcd data dir.
3-空Etcd数据目录。 (data and data-events)
(数据和数据事件)
4 - Edit /etc/kubernetes/manifests/etcd.manifests and etcd-events.manifest changing ETCD_INITIAL_CLUSTER_STATE from new to existing.
4-编辑/etc/kubernetes/manifests/etcd.manifests和etcd-events.manifest,将ETCD_INITIAL_CLUSTER_STATE从新更改为现有。
5 - Get the name and PeerURLS from new master and use etcdctl to add the new master on the cluster.
5-从新的主服务器获取名称和PeerURLS,并使用etcdctl在群集上添加新的主服务器。 (etcdctl member add "name" "PeerULR")You will need to connect on the etcd-server container to do this.
(etcdctl成员添加“名称”“ PeerULR”)您将需要在etcd服务器容器上进行连接。
6 - Start kubelet and protokube services on the new master.
6-在新的主服务器上启动kubelet和protokube服务。
Please let me know if that helped. 请让我知道是否有帮助。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.