[英]failed to load kubelet config file
hy folks嘿伙计们
after updating my server, I can't restart kubernetes.更新我的服务器后,我无法重新启动 kubernetes。
Feb 6 10:34:26 chgvas99 kubelet: F0206 10:34:26.662744 27634 server.go:189] failed to load Kubelet config file /var/lib/kubelet/config.yaml, error failed to read kubelet config file "/var/lib/kubelet/config.yaml", error: open /var/lib/kubelet/config.yaml: no such file or directory
Feb 6 10:34:26 chgvas99 systemd: kubelet.service: main process exited, code=exited, status=255/n/a
Feb 6 10:34:26 chgvas99 systemd: Unit kubelet.service entered failed state.
Feb 6 10:34:26 chgvas99 systemd: kubelet.service failed.
i checked on the directory and indeed there is no config.yaml i've the same error on my nodes i cant restart them我检查了目录,确实没有config.yaml我的节点上也有同样的错误,我无法重新启动它们
server : 3.10.0-957.5.1.el7.x86_64服务器: 3.10.0-957.5.1.el7.x86_64
kubernetes : Major:"1", Minor:"13", GitVersion:"v1.13.3" GoVersion:"go1.11.5" kubernetes : Major:"1", Minor:"13", GitVersion:"v1.13.3" GoVersion:"go1.11.5"
You're using kubeadm - So The fact that /var/lib/kubelet/config.yaml
is empty is probably related to the worker node not being joined to the cluster.您正在使用kubeadm - 所以
/var/lib/kubelet/config.yaml
为空的事实可能与工作节点未加入集群有关。
This might be related to networking issues - but lets try step by step:这可能与网络问题有关 - 但让我们逐步尝试:
1 ) Create a valid token for the worker node to join the cluster: 1 ) 为工作节点创建一个有效的令牌以加入集群:
Run: sudo kubeadm token create --print-join-command --v=5
and make sure you receive an output command like:运行:
sudo kubeadm token create --print-join-command --v=5
并确保您收到如下输出命令:
kubeadm join <master-node-ip>:6443 --token aa334.. --discovery-token-ca-cert-hash sha256:..
2 ) Run the provided command in the worker node. 2 ) 在工作节点中运行提供的命令。
3 ) If Everything is OK - the /var/lib/kubelet/config.yaml
should be populated and the status of sudo systemctl status kubelet
should look good. 3)如果一切正常 - 应该填充
/var/lib/kubelet/config.yaml
并且sudo systemctl status kubelet
应该看起来不错。
4 ) If you can an error - try running the ame join command with --v=5
- you'll probably see some networking issues. 4 ) 如果您可以出错 - 尝试使用
--v=5
运行 ame join 命令 - 您可能会看到一些网络问题。
4.A ) If you got an error like dial tcp <master-ip>:6443: connect: no route to host
- make sure that you have a communication between your nodes - run curl <master-node-ip>:6443
from worker node - you'll probably get the same no route
error. 4.A ) 如果您遇到类似
dial tcp <master-ip>:6443: connect: no route to host
- 确保您的节点之间有通信 - 运行curl <master-node-ip>:6443
from工作节点 - 您可能会遇到相同的no route
错误。
Go to the master node and open the 6443 port
(I'll assume you're working on secured private network) and try the connectivity again.转到主节点并打开
6443 port
(我假设您正在使用安全的专用网络)并再次尝试连接。
4.B ) If opening port in master succeed and you're able to curl from worker to master you should receive a response from the API server like: Client sent an HTTP request to an HTTPS server
. 4.B ) 如果在 master 中打开端口成功并且您能够从 worker 卷曲到 master,您应该收到来自 API 服务器的响应,例如:
Client sent an HTTP request to an HTTPS server
。
5 ) If curl
succeed but you're still facing connectivity problems try: 5 ) 如果
curl
成功但您仍然面临连接问题,请尝试:
5.A ) Comparing the .kube/config
files of master and worker nodes - make sure the IP of the API server is correct. 5.A ) 比较主节点和工作节点的
.kube/config
文件 - 确保 API 服务器的 IP 正确。
5.B ) Make sure you enabled bridge networking mode on all nodes: sudo sysctl net.bridge.bridge-nf-call-iptables=1
. 5.B ) 确保您在所有节点上启用了桥接网络模式:
sudo sysctl net.bridge.bridge-nf-call-iptables=1
。
5.C ) Make sure you have an SDN solution likecalico , flannel or weave and that you see that the relevant kube-system pods are running: 5.C)确保你有一个SDN解决方案像棉布,绒布或 编织和你看到有关KUBE-系统吊舱正在运行:
$kubectl -n kube-system get pods
NAME READY STATUS RESTARTS AGE
coredns-f9fd979d6-lpdlc 1/1 Running 2 7d12h
coredns-f9fd979d6-vcs7g 1/1 Running 2 7d12h
etcd-master-node-k8s 1/1 Running 2 7d12h
kube-apiserver-master-node-k8s 1/1 Running 2 7d12h
kube-controller-manager-master-node-k8s 1/1 Running 2 7d12h
kube-proxy-kh2lc 1/1 Running 2 7d12h
kube-proxy-lfmc4 1/1 Running 0 4m36s
kube-scheduler-master-node-k8s 1/1 Running 2 7d12h
weave-net-59r5b 2/2 Running 6 7d11h <-- Here
weave-net-c44d6 2/2 Running 1 4m36s <-- Here
6 ) If nothing works - try running kubeadm reset
on the worker node. 6 ) 如果没有任何效果 - 尝试在工作节点上运行
kubeadm reset
。
My Env: 3 master and forefront loadbalancer while initializing cluster My Env:初始化集群时的 3 个主负载均衡器和最前沿负载均衡器
Had to adjust my load balancer and take out other master nodes and "kubeadm init" and issue was gone , Network error possibly .不得不调整我的负载均衡器并取出其他主节点和“kubeadm init”,问题就消失了,可能是网络错误。
Then readd all the other master nodes.然后读取所有其他主节点。
Had a similar issue where got that error on control join command.有一个类似的问题,在控制连接命令上出现错误。 It ended up being the load balancer ip assigned to the joining control as secondary from a previous installation (which has nothing running behind it).
它最终成为分配给加入控制的负载均衡器 ip,作为以前安装的辅助设备(后面没有任何运行)。 Run
ip a
and make sure that the joining node does not have secondary load balancer ip.运行
ip a
并确保加入节点没有辅助负载均衡器 ip。
echo y | kubeadm reset || true
rm -rf /etc/cni/net.d || true
rm -rf /var/lib/etcd || true
rm -rf ~/.kube || true
ip address delete {{ k8s_apiserver_vip_cidr }} dev {{ k8s_interface }}
kubeadm join ...
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.