简体   繁体   中英

Adding node to kubernetes cluster gives failed to load Kubelet config file /var/lib/kubelet/config.yaml and no networks found in /etc/cni/net.d

I have a two node k8s cluster working. I added another node to the cluster and the sudo kubeadm join ... command reported that the node had joined the cluster. The new node is stuck in the NotReady state:

kubectl get nodes
NAME               STATUS     ROLES    AGE    VERSION
msi-ubuntu18       NotReady   <none>   29m    v1.19.0
tv                 Ready      master   131d   v1.18.6
ubuntu-18-extssd   Ready      <none>   131d   v1.17.4

The journalctl -u kubelet shows this error:

Started kubelet: The Kubernetes Node Agent.
  22039 server.go:198] failed to load Kubelet config file /var/lib/kubelet/config.yaml, error failed to read kubelet config file "/var/l...

But the file /var/lib/kubelet/config.yaml exists and looks OK.

The sudo systemctl status kubelet shows a different error:

kubelet.go:2103] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plu
cni.go:239] Unable to update cni config: no networks found in /etc/cni/net.d

And there is no /etc/cni/ directory on the new node. (The existing node has /etc/cni/net.d/ with calico files in it.) If I run

kubectl apply -f https://docs.projectcalico.org/v3.11/manifests/calico.yaml

on the master again it doesn't solve the problem. There is still no /etc/cni/ dir on the new node.

I must have missed a step when creating the new node. How do I get the /etc/cni/ directory on the new node? It's also puzzling that the kubeadm join ... command indicates success when the new node is stuck in NotReady.

For anyone else running into this problem, I was finally able to solve this by doing

kubectl delete -f https://docs.projectcalico.org/v3.11/manifests/calico.yaml

followed by

kubectl apply -f https://docs.projectcalico.org/manifests/calico.yaml

There must have been some version incompatibility between version 3.11, which I had installed a few months ago and the new node.

I also encounter same situation when initialized cluster with pods cidr #kubeadm init --pod-network-cidr=10.10.0.0/16

在此处输入图片说明

But, #kubectl get pods --all-namespaces command helped to fix the issue.

在此处输入图片说明

I just ran through a similar situation, but the proximate cause was at a higher level.

Basically I applied some Gatekeeper security policies to the kube-system namespace without recognizing I'd have to make exceptions for kube-proxy and aws-node (this was in EKS).

A couple examples from the kube event logs:

[denied by psp-pods-allowed-user-ranges] Container kube-proxy is attempting to run without a required securityContext/runAsGroup. Allowed runAsGroup: {"ranges": [{"max": 65535, "min": 1}], "rule": "MustRunAs"}
[denied by caps-constraints] container <kube-proxy> is not dropping all required capabilities. Container must drop all of ["ALL"]
[denied by psp-hostfs-constraints] HostPath volume {"name": "xtables-lock", "hostPath": {"path": "/run/xtables.lock", "type": "FileOrCreate"}} is not allowed, pod: kube-proxy-j5h2d. Allowed path: [{"pathPrefix": "/tmp", "readOnly": true}]

I didn't notice this for a solid month after I'd applied the changes; it only showed up after one of my EKS nodes restarted for some reason.

Posting here in hopes it might save somebody else the day I lost.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM