I have a two node k8s cluster working. I added another node to the cluster and the sudo kubeadm join ...
command reported that the node had joined the cluster. The new node is stuck in the NotReady state:
kubectl get nodes
NAME STATUS ROLES AGE VERSION
msi-ubuntu18 NotReady <none> 29m v1.19.0
tv Ready master 131d v1.18.6
ubuntu-18-extssd Ready <none> 131d v1.17.4
The journalctl -u kubelet
shows this error:
Started kubelet: The Kubernetes Node Agent.
22039 server.go:198] failed to load Kubelet config file /var/lib/kubelet/config.yaml, error failed to read kubelet config file "/var/l...
But the file /var/lib/kubelet/config.yaml exists and looks OK.
The sudo systemctl status kubelet
shows a different error:
kubelet.go:2103] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plu
cni.go:239] Unable to update cni config: no networks found in /etc/cni/net.d
And there is no /etc/cni/ directory on the new node. (The existing node has /etc/cni/net.d/ with calico files in it.) If I run
kubectl apply -f https://docs.projectcalico.org/v3.11/manifests/calico.yaml
on the master again it doesn't solve the problem. There is still no /etc/cni/ dir on the new node.
I must have missed a step when creating the new node. How do I get the /etc/cni/ directory on the new node? It's also puzzling that the kubeadm join ...
command indicates success when the new node is stuck in NotReady.
For anyone else running into this problem, I was finally able to solve this by doing
kubectl delete -f https://docs.projectcalico.org/v3.11/manifests/calico.yaml
followed by
kubectl apply -f https://docs.projectcalico.org/manifests/calico.yaml
There must have been some version incompatibility between version 3.11, which I had installed a few months ago and the new node.
I just ran through a similar situation, but the proximate cause was at a higher level.
Basically I applied some Gatekeeper security policies to the kube-system
namespace without recognizing I'd have to make exceptions for kube-proxy
and aws-node
(this was in EKS).
A couple examples from the kube event logs:
[denied by psp-pods-allowed-user-ranges] Container kube-proxy is attempting to run without a required securityContext/runAsGroup. Allowed runAsGroup: {"ranges": [{"max": 65535, "min": 1}], "rule": "MustRunAs"}
[denied by caps-constraints] container <kube-proxy> is not dropping all required capabilities. Container must drop all of ["ALL"]
[denied by psp-hostfs-constraints] HostPath volume {"name": "xtables-lock", "hostPath": {"path": "/run/xtables.lock", "type": "FileOrCreate"}} is not allowed, pod: kube-proxy-j5h2d. Allowed path: [{"pathPrefix": "/tmp", "readOnly": true}]
I didn't notice this for a solid month after I'd applied the changes; it only showed up after one of my EKS nodes restarted for some reason.
Posting here in hopes it might save somebody else the day I lost.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.