[英]Kubernetes Calico node 'XXXXXXXXXXX' already using IPv4 Address XXXXXXXXX, CrashLoopBackOff
I used the AWS Kubernetes Quickstart to create a Kubernetes cluster in a VPC and private subnet: https://aws-quickstart.s3.amazonaws.com/quickstart-heptio/doc/heptio-kubernetes-on-the-aws-cloud.pdf .我使用 AWS Kubernetes 快速入门在 VPC 和私有子网中创建 Kubernetes 集群: https://aws-quickstart.s3.amazonaws.com/quickstart-heptio/doc/heptio-kubernetes-on-the-aws-cloud。 .pdf It was running fine for a while.它运行了一段时间。 I have Calico installed on my Kubernetes cluster.我的 Kubernetes 集群上安装了 Calico。 I have two nodes and a master.我有两个节点和一个主节点。 The calico pods on the master are running fine, the ones on the nodes are in crashloopbackoff state:主节点上的 calico pod 运行良好,节点上的 calico pod 处于 crashloopbackoff 状态:
NAME READY STATUS RESTARTS AGE
calico-etcd-ztwjj 1/1 Running 1 55d
calico-kube-controllers-685755779f-ftm92 1/1 Running 2 55d
calico-node-gkjgl 1/2 CrashLoopBackOff 270 22h
calico-node-jxkvx 2/2 Running 4 55d
calico-node-mxhc5 1/2 CrashLoopBackOff 9 25m
Describing one of the crashed pods:描述其中一个坠毁的吊舱:
ubuntu@ip-10-0-1-133:~$ kubectl describe pod calico-node-gkjgl -n kube-system
Name: calico-node-gkjgl
Namespace: kube-system
Node: ip-10-0-0-237.us-east-2.compute.internal/10.0.0.237
Start Time: Mon, 17 Sep 2018 16:56:41 +0000
Labels: controller-revision-hash=185957727
k8s-app=calico-node
pod-template-generation=1
Annotations: scheduler.alpha.kubernetes.io/critical-pod=
Status: Running
IP: 10.0.0.237
Controlled By: DaemonSet/calico-node
Containers:
calico-node:
Container ID: docker://d89979ba963c33470139fd2093a5427b13c6d44f4c6bb546c9acdb1a63cd4f28
Image: quay.io/calico/node:v3.1.1
Image ID: docker-pullable://quay.io/calico/node@sha256:19fdccdd4a90c4eb0301b280b50389a56e737e2349828d06c7ab397311638d29
Port: <none>
Host Port: <none>
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 1
Started: Tue, 18 Sep 2018 15:14:44 +0000
Finished: Tue, 18 Sep 2018 15:14:44 +0000
Ready: False
Restart Count: 270
Requests:
cpu: 250m
Liveness: http-get http://:9099/liveness delay=10s timeout=1s period=10s #success=1 #failure=6
Readiness: http-get http://:9099/readiness delay=0s timeout=1s period=10s #success=1 #failure=3
Environment:
ETCD_ENDPOINTS: <set to the key 'etcd_endpoints' of config map 'calico-config'> Optional: false
CALICO_NETWORKING_BACKEND: <set to the key 'calico_backend' of config map 'calico-config'> Optional: false
CLUSTER_TYPE: kubeadm,bgp
CALICO_DISABLE_FILE_LOGGING: true
CALICO_K8S_NODE_REF: (v1:spec.nodeName)
FELIX_DEFAULTENDPOINTTOHOSTACTION: ACCEPT
CALICO_IPV4POOL_CIDR: 192.168.0.0/16
CALICO_IPV4POOL_IPIP: Always
FELIX_IPV6SUPPORT: false
FELIX_IPINIPMTU: 1440
FELIX_LOGSEVERITYSCREEN: info
IP: autodetect
FELIX_HEALTHENABLED: true
Mounts:
/lib/modules from lib-modules (ro)
/var/lib/calico from var-lib-calico (rw)
/var/run/calico from var-run-calico (rw)
/var/run/secrets/kubernetes.io/serviceaccount from calico-cni-plugin-token-b7sfl (ro)
install-cni:
Container ID: docker://b37e0ec7eba690473a4999a31d9f766f7adfa65f800a7b2dc8e23ead7520252d
Image: quay.io/calico/cni:v3.1.1
Image ID: docker-pullable://quay.io/calico/cni@sha256:dc345458d136ad9b4d01864705895e26692d2356de5c96197abff0030bf033eb
Port: <none>
Host Port: <none>
Command:
/install-cni.sh
State: Running
Started: Mon, 17 Sep 2018 17:11:52 +0000
Last State: Terminated
Reason: Completed
Exit Code: 0
Started: Mon, 17 Sep 2018 16:56:43 +0000
Finished: Mon, 17 Sep 2018 17:10:53 +0000
Ready: True
Restart Count: 1
Environment:
CNI_CONF_NAME: 10-calico.conflist
ETCD_ENDPOINTS: <set to the key 'etcd_endpoints' of config map 'calico-config'> Optional: false
CNI_NETWORK_CONFIG: <set to the key 'cni_network_config' of config map 'calico-config'> Optional: false
Mounts:
/host/etc/cni/net.d from cni-net-dir (rw)
/host/opt/cni/bin from cni-bin-dir (rw)
/var/run/secrets/kubernetes.io/serviceaccount from calico-cni-plugin-token-b7sfl (ro)
Conditions:
Type Status
Initialized True
Ready False
PodScheduled True
Volumes:
lib-modules:
Type: HostPath (bare host directory volume)
Path: /lib/modules
HostPathType:
var-run-calico:
Type: HostPath (bare host directory volume)
Path: /var/run/calico
HostPathType:
var-lib-calico:
Type: HostPath (bare host directory volume)
Path: /var/lib/calico
HostPathType:
cni-bin-dir:
Type: HostPath (bare host directory volume)
Path: /opt/cni/bin
HostPathType:
cni-net-dir:
Type: HostPath (bare host directory volume)
Path: /etc/cni/net.d
HostPathType:
calico-cni-plugin-token-b7sfl:
Type: Secret (a volume populated by a Secret)
SecretName: calico-cni-plugin-token-b7sfl
Optional: false
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: :NoSchedule
:NoExecute
:NoSchedule
:NoExecute
CriticalAddonsOnly
node.kubernetes.io/disk-pressure:NoSchedule
node.kubernetes.io/memory-pressure:NoSchedule
node.kubernetes.io/not-ready:NoExecute
node.kubernetes.io/unreachable:NoExecute
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning BackOff 4m (x6072 over 22h) kubelet, ip-10-0-0-237.us-east-2.compute.internal Back-off restarting failed container
The logs for the same pod:同一 pod 的日志:
ubuntu@ip-10-0-1-133:~$ kubectl logs calico-node-gkjgl -n kube-system -c calico-node
2018-09-18 15:14:44.605 [INFO][8] startup.go 251: Early log level set to info
2018-09-18 15:14:44.605 [INFO][8] startup.go 269: Using stored node name from /var/lib/calico/nodename
2018-09-18 15:14:44.605 [INFO][8] startup.go 279: Determined node name: ip-10-0-0-237.us-east-2.compute.internal
2018-09-18 15:14:44.609 [INFO][8] startup.go 101: Skipping datastore connection test
2018-09-18 15:14:44.610 [INFO][8] startup.go 352: Building new node resource Name="ip-10-0-0-237.us-east-2.compute.internal"
2018-09-18 15:14:44.610 [INFO][8] startup.go 367: Initialize BGP data
2018-09-18 15:14:44.614 [INFO][8] startup.go 564: Using autodetected IPv4 address on interface ens3: 10.0.0.237/19
2018-09-18 15:14:44.614 [INFO][8] startup.go 432: Node IPv4 changed, will check for conflicts
2018-09-18 15:14:44.618 [WARNING][8] startup.go 861: Calico node 'ip-10-0-0-237' is already using the IPv4 address 10.0.0.237.
2018-09-18 15:14:44.618 [WARNING][8] startup.go 1058: Terminating
Calico node failed to start
So it seems like there is a conflict finding the node IP address, or Calico seems to think the IP is already assigned to another node.因此,查找节点 IP 地址似乎存在冲突,或者 Calico 似乎认为 IP 已分配给另一个节点。 Doing a quick search i found this thread: https://github.com/projectcalico/calico/issues/1628 .快速搜索我发现了这个线程: https ://github.com/projectcalico/calico/issues/1628。 I see that this should be resolved by setting the IP_AUTODETECTION_METHOD to can-reach=DESTINATION, which I'm assuming would be "can-reach=10.0.0.237".我看到这应该通过将 IP_AUTODETECTION_METHOD 设置为 can-reach=DESTINATION 来解决,我假设它是“can-reach=10.0.0.237”。 This config is an environment variable set on calico/node container.此配置是在 calico/node 容器上设置的环境变量。 I have been attempting to shell into the container itself, but kubectl tells me the container is not found:我一直在尝试进入容器本身,但 kubectl 告诉我找不到容器:
ubuntu@ip-10-0-1-133:~$ kubectl exec calico-node-gkjgl --stdin --tty /bin/sh -c calico-node -n kube-system
error: unable to upgrade connection: container not found ("calico-node")
I'm suspecting this is due to Calico being unable to assign IPs.我怀疑这是由于 Calico 无法分配 IP。 So I logged onto the host and attempt to shell on the container using docker:所以我登录到主机并尝试使用 docker 在容器上运行:
root@ip-10-0-0-237:~# docker exec -it k8s_POD_calico-node-gkjgl_kube-system_a6998e98-ba9a-11e8-a9fa-0a97f5a48ef4_1 /bin/bash
rpc error: code = 2 desc = oci runtime error: exec failed: container_linux.go:247: starting container process caused "exec: \"/bin/bash\": stat /bin/bash: no such file or directory"
So I guess there is no shell to execute in the container.所以我猜容器中没有要执行的shell。 Makes sense why Kubernetes couldn't execute that. Kubernetes 无法执行的原因是有道理的。 I tried running commands externally to list environment variables, but I haven't been able to find any, I could be running these commands wrong however:我尝试在外部运行命令以列出环境变量,但我找不到任何命令,但是我可能会错误地运行这些命令:
root@ip-10-0-0-237:~# docker inspect -f '{{range $index, $value := .Config.Env}}{{$value}} {{end}}' k8s_POD_calico-node-gkjgl_kube-system_a6998e98-ba9a-11e8-a9fa-0a97f5a48ef4_1
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
root@ip-10-0-0-237:~# docker exec -it k8s_POD_calico-node-gkjgl_kube-system_a6998e98-ba9a-11e8-a9fa-0a97f5a48ef4_1 printenv IP_AUTODETECTION_METHOD
rpc error: code = 2 desc = oci runtime error: exec failed: container_linux.go:247: starting container process caused "exec: \"printenv\": executable file not found in $PATH"
root@ip-10-0-0-237:~# docker exec -it k8s_POD_calico-node-gkjgl_kube-system_a6998e98-ba9a-11e8-a9fa-0a97f5a48ef4_1 /bin/env
rpc error: code = 2 desc = oci runtime error: exec failed: container_linux.go:247: starting container process caused "exec: \"/bin/env\": stat /bin/env: no such file or directory"
Okay, so maybe I am going about this the wrong way.好的,所以也许我正在以错误的方式解决这个问题。 Should I attempt to change the Calico config files using Kubernetes and redeploy it?我应该尝试使用 Kubernetes 更改 Calico 配置文件并重新部署它吗? Where can I find these on my system?我在哪里可以在我的系统上找到这些? I haven't been able to find where to set the environment variables.我一直无法找到在哪里设置环境变量。
If you look at the Calico docs IP_AUTODETECTION_METHOD
is already defaulting to first-round
.如果您查看Calico 文档IP_AUTODETECTION_METHOD
已经默认为first-round
。
My guess is that something or the IP address is not being released by the previous 'run' of calico, or just simply a bug in the v3.1.1
version of calico.我的猜测是之前的“运行”calico 没有释放某些东西或 IP 地址,或者只是 calico 的v3.1.1
版本中的一个错误。
Try:尝试:
Delete your Calico pods that are in a CrashBackOff loop删除 CrashBackOff 循环中的 Calico pod
kubectl -n kube-system delete calico-node-gkjgl calico-node-mxhc5
Your pods will be re-created and hopefully initialize.您的 pod 将被重新创建并希望初始化。
Upgrade Calico to v3.1.3
or latest.将 Calico 升级到v3.1.3
或最新版本。 Follow these docs My guess is that Heptio's Calico installation is using the etcd datastore.按照这些文档我的猜测是 Heptio 的 Calico 安装正在使用 etcd 数据存储。
Try to understand how Heptio's AWS AMIs work and see if there are any issues with them.尝试了解 Heptio 的 AWS AMI 的工作原理,看看它们是否存在任何问题。 This might take some time so you could contact their support as well.这可能需要一些时间,因此您也可以联系他们的支持。
Try a different method to install Kubernetes with Calico.尝试不同的方法来使用 Calico 安装 Kubernetes。 Well documented on https://kubernetes.io在https://kubernetes.io上有很好的记录
For me what worked was to remove left over docker-networks on the Nodes.对我来说,有效的是删除节点上剩余的 docker-networks。
I had to list out current networks on each Node: docker network list
and then remove the unneeded ones: docker network rm <networkName>
.我必须列出每个节点上的当前网络: docker network list
,然后删除不需要的网络: docker network rm <networkName>
。
After doing that the calico deployment pods were running fine之后,印花布部署吊舱运行良好
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.