简体   繁体   English

升级到 kubelet v1.24 后集群无法启动

[英]after upgrading to kubelet v1.24 clusted does not start

After apt update && apt upgrade kubelet is no longer starting up.apt update && apt upgrade kubelet 之后不再启动。 In journalctl it's printing a kubelet 's helptext and complaining about unsupported --network-plugin flag.在 journalctl 中,它正在打印kubelet的帮助文本并抱怨不支持的 --network-plugin 标志。

Looks like after upgrading to kubelet 1.24.0 the cluster broke down.升级到 kubelet 1.24.0 后,集群似乎崩溃了。

root@netikras-hub:/etc/systemd/system/kubelet.service.d# kubelet --version
Kubernetes v1.24.0
root@netikras-hub:/etc/systemd/system/kubelet.service.d# kubelet --help | grep network-plugin
root@netikras-hub:/etc/systemd/system/kubelet.service.d# 
root@netikras-hub:/etc/systemd/system/kubelet.service.d# kubelet --network-plugin=cni 2>&1 | head -3
Error: failed to parse kubelet flag: unknown flag: --network-plugin
Usage:
  kubelet [flags]

while it seems to be working on 1.20.4虽然它似乎在 1.20.4 上工作

[root@CentOS-83-64-minimal ~]# kubelet --version
Kubernetes v1.20.4
[root@CentOS-83-64-minimal ~]# kubelet --help | grep network-plugin
      --network-plugin string                                    The name of the network plugin to be invoked for various events in kubelet/pod lifecycle. This docker-specific flag only works when container-runtime is set to docker.
      --network-plugin-mtu int32                                 The MTU to be passed to the network plugin, to override the default. Set to 0 to use the default 1460 MTU. This docker-specific flag only works when container-runtime is set to docker.
[root@CentOS-83-64-minimal ~]# 

I found that v1.24 still refers to the netwok-plugin flag and raised a GL issue to update the docs in this ticket .我发现 v1.24 仍然引用 netwok-plugin 标志并提出 GL 问题来更新此票证中的文档。 However, folks there are keen on updating the docs only, and not guiding through my cluster recovery options.但是,那里的人只热衷于更新文档,而不是指导我的集群恢复选项。

What is the easiest way to recover?最简单的恢复方法是什么? I'm using flannel as my CNI.我使用法兰绒作为我的 CNI。

My understanding is that after the dockershim removal, all container runtimes are CNI-aware so I would expect them to use the standard /etc/cni/net.d/ mechanism for identifying the CNI plugin without needing the previous hints.我的理解是,在删除 dockershim 之后,所有容器运行时都支持 CNI,因此我希望它们使用标准 /etc/cni/net.d/ 机制来识别 CNI 插件,而无需前面的提示。

If you have a correct /etc/cni/net.d/nn-provider.conflist and the binaries in /opt/cni/bin you can just remove the faulting kubelet flags and it 'should just work'.如果你有一个正确的 /etc/cni/net.d/nn-provider.conflist 和 /opt/cni/bin 中的二进制文件,你可以删除错误的 kubelet 标志,它“应该可以工作”。

If this doesn't work I would suggest having a look at your flannel daemonset manifest and see what it thinks the location of the CNI bindir is.如果这不起作用,我建议查看您的 flannel daemonset manifest,看看它认为 CNI bindir 的位置是什么。

作为在 kubeadm 中解决此问题之前的解决方法,您可以通过在 kubeadm之后运行以下命令来删除与网络相关的标志:

echo "KUBELET_NETWORK_ARGS=''" | sudo tee --append /var/lib/kubelet/kubeadm-flags.env

kubelet demands new container runtime bcs of deprecation of docker. kubelet 要求新的容器运行时 bcs 弃用 docker。 For solution:对于解决方案:

  1. cat <<EOF |猫 <<EOF | sudo tee /var/lib/kubelet/kubeadm-flags.env KUBELET_KUBEADM_ARGS="--container-runtime=remote --container-runtime-endpoint=unix:///var/run/containerd/containerd.sock --pod-infra-container-image=k8s.gcr.io/pause:3.7" sudo tee /var/lib/kubelet/kubeadm-flags.env KUBELET_KUBEADM_ARGS="--container-runtime=remote --container-runtime-endpoint=unix:///var/run/containerd/containerd.sock --pod- infra-container-image=k8s.gcr.io/pause:3.7"

  2. systemctl daemon-reload && systemctl restart kubelet systemctl daemon-reload && systemctl restart kubelet

not: you can disable docker.不是:您可以禁用 docker。 if any different config applied for docker, you should also configure containerd from /etc/containerd/config.toml如果对 docker 应用了任何不同的配置,您还应该从 /etc/containerd/config.toml 配置 containerd

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM