Kubernetes Calico 节点 'XXXXXXXXXXX' 已经在使用 IPv4 地址 XXXXXXXXX，CrashLoopBackOff

Question

I used the AWS Kubernetes Quickstart to create a Kubernetes cluster in a VPC and private subnet: https://aws-quickstart.s3.amazonaws.com/quickstart-heptio/doc/heptio-kubernetes-on-the-aws-cloud.pdf .我使用 AWS Kubernetes 快速入门在 VPC 和私有子网中创建 Kubernetes 集群： https://aws-quickstart.s3.amazonaws.com/quickstart-heptio/doc/heptio-kubernetes-on-the-aws-cloud。 .pdf It was running fine for a while.它运行了一段时间。 I have Calico installed on my Kubernetes cluster.我的 Kubernetes 集群上安装了 Calico。 I have two nodes and a master.我有两个节点和一个主节点。 The calico pods on the master are running fine, the ones on the nodes are in crashloopbackoff state:主节点上的 calico pod 运行良好，节点上的 calico pod 处于 crashloopbackoff 状态：

NAME                                                               READY     STATUS             RESTARTS   AGE
calico-etcd-ztwjj                                                  1/1       Running            1          55d
calico-kube-controllers-685755779f-ftm92                           1/1       Running            2          55d
calico-node-gkjgl                                                  1/2       CrashLoopBackOff   270        22h
calico-node-jxkvx                                                  2/2       Running            4          55d
calico-node-mxhc5                                                  1/2       CrashLoopBackOff   9          25m

Describing one of the crashed pods:描述其中一个坠毁的吊舱：

ubuntu@ip-10-0-1-133:~$ kubectl describe pod calico-node-gkjgl -n kube-system
Name:           calico-node-gkjgl
Namespace:      kube-system
Node:           ip-10-0-0-237.us-east-2.compute.internal/10.0.0.237
Start Time:     Mon, 17 Sep 2018 16:56:41 +0000
Labels:         controller-revision-hash=185957727
                k8s-app=calico-node
                pod-template-generation=1
Annotations:    scheduler.alpha.kubernetes.io/critical-pod=
Status:         Running
IP:             10.0.0.237
Controlled By:  DaemonSet/calico-node
Containers:
  calico-node:
    Container ID:   docker://d89979ba963c33470139fd2093a5427b13c6d44f4c6bb546c9acdb1a63cd4f28
    Image:          quay.io/calico/node:v3.1.1
    Image ID:       docker-pullable://quay.io/calico/node@sha256:19fdccdd4a90c4eb0301b280b50389a56e737e2349828d06c7ab397311638d29
    Port:           <none>
    Host Port:      <none>
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    1
      Started:      Tue, 18 Sep 2018 15:14:44 +0000
      Finished:     Tue, 18 Sep 2018 15:14:44 +0000
    Ready:          False
    Restart Count:  270
    Requests:
      cpu:      250m
    Liveness:   http-get http://:9099/liveness delay=10s timeout=1s period=10s #success=1 #failure=6
    Readiness:  http-get http://:9099/readiness delay=0s timeout=1s period=10s #success=1 #failure=3
    Environment:
      ETCD_ENDPOINTS:                     <set to the key 'etcd_endpoints' of config map 'calico-config'>  Optional: false
      CALICO_NETWORKING_BACKEND:          <set to the key 'calico_backend' of config map 'calico-config'>  Optional: false
      CLUSTER_TYPE:                       kubeadm,bgp
      CALICO_DISABLE_FILE_LOGGING:        true
      CALICO_K8S_NODE_REF:                 (v1:spec.nodeName)
      FELIX_DEFAULTENDPOINTTOHOSTACTION:  ACCEPT
      CALICO_IPV4POOL_CIDR:               192.168.0.0/16
      CALICO_IPV4POOL_IPIP:               Always
      FELIX_IPV6SUPPORT:                  false
      FELIX_IPINIPMTU:                    1440
      FELIX_LOGSEVERITYSCREEN:            info
      IP:                                 autodetect
      FELIX_HEALTHENABLED:                true
    Mounts:
      /lib/modules from lib-modules (ro)
      /var/lib/calico from var-lib-calico (rw)
      /var/run/calico from var-run-calico (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from calico-cni-plugin-token-b7sfl (ro)
  install-cni:
    Container ID:  docker://b37e0ec7eba690473a4999a31d9f766f7adfa65f800a7b2dc8e23ead7520252d
    Image:         quay.io/calico/cni:v3.1.1
    Image ID:      docker-pullable://quay.io/calico/cni@sha256:dc345458d136ad9b4d01864705895e26692d2356de5c96197abff0030bf033eb
    Port:          <none>
    Host Port:     <none>
    Command:
      /install-cni.sh
    State:          Running
      Started:      Mon, 17 Sep 2018 17:11:52 +0000
    Last State:     Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Mon, 17 Sep 2018 16:56:43 +0000
      Finished:     Mon, 17 Sep 2018 17:10:53 +0000
    Ready:          True
    Restart Count:  1
    Environment:
      CNI_CONF_NAME:       10-calico.conflist
      ETCD_ENDPOINTS:      <set to the key 'etcd_endpoints' of config map 'calico-config'>      Optional: false
      CNI_NETWORK_CONFIG:  <set to the key 'cni_network_config' of config map 'calico-config'>  Optional: false
    Mounts:
      /host/etc/cni/net.d from cni-net-dir (rw)
      /host/opt/cni/bin from cni-bin-dir (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from calico-cni-plugin-token-b7sfl (ro)
Conditions:
  Type           Status
  Initialized    True
  Ready          False
  PodScheduled   True
Volumes:
  lib-modules:
    Type:          HostPath (bare host directory volume)
    Path:          /lib/modules
    HostPathType:
  var-run-calico:
    Type:          HostPath (bare host directory volume)
    Path:          /var/run/calico
    HostPathType:
  var-lib-calico:
    Type:          HostPath (bare host directory volume)
    Path:          /var/lib/calico
    HostPathType:
  cni-bin-dir:
    Type:          HostPath (bare host directory volume)
    Path:          /opt/cni/bin
    HostPathType:
  cni-net-dir:
    Type:          HostPath (bare host directory volume)
    Path:          /etc/cni/net.d
    HostPathType:
  calico-cni-plugin-token-b7sfl:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  calico-cni-plugin-token-b7sfl
    Optional:    false
QoS Class:       Burstable
Node-Selectors:  <none>
Tolerations:     :NoSchedule
                 :NoExecute
                 :NoSchedule
                 :NoExecute
                 CriticalAddonsOnly
                 node.kubernetes.io/disk-pressure:NoSchedule
                 node.kubernetes.io/memory-pressure:NoSchedule
                 node.kubernetes.io/not-ready:NoExecute
                 node.kubernetes.io/unreachable:NoExecute
Events:
  Type     Reason   Age                  From                                               Message
  ----     ------   ----                 ----                                               -------
  Warning  BackOff  4m (x6072 over 22h)  kubelet, ip-10-0-0-237.us-east-2.compute.internal  Back-off restarting failed container

The logs for the same pod:同一 pod 的日志：

ubuntu@ip-10-0-1-133:~$ kubectl logs calico-node-gkjgl -n kube-system -c calico-node
2018-09-18 15:14:44.605 [INFO][8] startup.go 251: Early log level set to info
2018-09-18 15:14:44.605 [INFO][8] startup.go 269: Using stored node name from /var/lib/calico/nodename
2018-09-18 15:14:44.605 [INFO][8] startup.go 279: Determined node name: ip-10-0-0-237.us-east-2.compute.internal
2018-09-18 15:14:44.609 [INFO][8] startup.go 101: Skipping datastore connection test
2018-09-18 15:14:44.610 [INFO][8] startup.go 352: Building new node resource Name="ip-10-0-0-237.us-east-2.compute.internal"
2018-09-18 15:14:44.610 [INFO][8] startup.go 367: Initialize BGP data
2018-09-18 15:14:44.614 [INFO][8] startup.go 564: Using autodetected IPv4 address on interface ens3: 10.0.0.237/19
2018-09-18 15:14:44.614 [INFO][8] startup.go 432: Node IPv4 changed, will check for conflicts
2018-09-18 15:14:44.618 [WARNING][8] startup.go 861: Calico node 'ip-10-0-0-237' is already using the IPv4 address 10.0.0.237.
2018-09-18 15:14:44.618 [WARNING][8] startup.go 1058: Terminating
Calico node failed to start

So it seems like there is a conflict finding the node IP address, or Calico seems to think the IP is already assigned to another node.因此，查找节点 IP 地址似乎存在冲突，或者 Calico 似乎认为 IP 已分配给另一个节点。 Doing a quick search i found this thread: https://github.com/projectcalico/calico/issues/1628 .快速搜索我发现了这个线程： https ://github.com/projectcalico/calico/issues/1628。 I see that this should be resolved by setting the IP_AUTODETECTION_METHOD to can-reach=DESTINATION, which I'm assuming would be "can-reach=10.0.0.237".我看到这应该通过将 IP_AUTODETECTION_METHOD 设置为 can-reach=DESTINATION 来解决，我假设它是“can-reach=10.0.0.237”。 This config is an environment variable set on calico/node container.此配置是在 calico/node 容器上设置的环境变量。 I have been attempting to shell into the container itself, but kubectl tells me the container is not found:我一直在尝试进入容器本身，但 kubectl 告诉我找不到容器：

ubuntu@ip-10-0-1-133:~$ kubectl exec calico-node-gkjgl --stdin --tty /bin/sh -c calico-node -n kube-system
error: unable to upgrade connection: container not found ("calico-node")

I'm suspecting this is due to Calico being unable to assign IPs.我怀疑这是由于 Calico 无法分配 IP。 So I logged onto the host and attempt to shell on the container using docker:所以我登录到主机并尝试使用 docker 在容器上运行：

root@ip-10-0-0-237:~# docker exec -it k8s_POD_calico-node-gkjgl_kube-system_a6998e98-ba9a-11e8-a9fa-0a97f5a48ef4_1 /bin/bash
rpc error: code = 2 desc = oci runtime error: exec failed: container_linux.go:247: starting container process caused "exec: \"/bin/bash\": stat /bin/bash: no such file or directory"

So I guess there is no shell to execute in the container.所以我猜容器中没有要执行的shell。 Makes sense why Kubernetes couldn't execute that. Kubernetes 无法执行的原因是有道理的。 I tried running commands externally to list environment variables, but I haven't been able to find any, I could be running these commands wrong however:我尝试在外部运行命令以列出环境变量，但我找不到任何命令，但是我可能会错误地运行这些命令：

root@ip-10-0-0-237:~# docker inspect -f '{{range $index, $value := .Config.Env}}{{$value}} {{end}}' k8s_POD_calico-node-gkjgl_kube-system_a6998e98-ba9a-11e8-a9fa-0a97f5a48ef4_1
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin

root@ip-10-0-0-237:~# docker exec -it k8s_POD_calico-node-gkjgl_kube-system_a6998e98-ba9a-11e8-a9fa-0a97f5a48ef4_1 printenv IP_AUTODETECTION_METHOD
rpc error: code = 2 desc = oci runtime error: exec failed: container_linux.go:247: starting container process caused "exec: \"printenv\": executable file not found in $PATH"

root@ip-10-0-0-237:~# docker exec -it k8s_POD_calico-node-gkjgl_kube-system_a6998e98-ba9a-11e8-a9fa-0a97f5a48ef4_1 /bin/env
rpc error: code = 2 desc = oci runtime error: exec failed: container_linux.go:247: starting container process caused "exec: \"/bin/env\": stat /bin/env: no such file or directory"

Okay, so maybe I am going about this the wrong way.好的，所以也许我正在以错误的方式解决这个问题。 Should I attempt to change the Calico config files using Kubernetes and redeploy it?我应该尝试使用 Kubernetes 更改 Calico 配置文件并重新部署它吗？ Where can I find these on my system?我在哪里可以在我的系统上找到这些？ I haven't been able to find where to set the environment variables.我一直无法找到在哪里设置环境变量。

Answer 1

If you look at the Calico docs IP_AUTODETECTION_METHOD is already defaulting to first-round .如果您查看Calico 文档IP_AUTODETECTION_METHOD已经默认为first-round 。

My guess is that something or the IP address is not being released by the previous 'run' of calico, or just simply a bug in the v3.1.1 version of calico.我的猜测是之前的“运行”calico 没有释放某些东西或 IP 地址，或者只是 calico 的v3.1.1版本中的一个错误。

Try:尝试：

Delete your Calico pods that are in a CrashBackOff loop删除 CrashBackOff 循环中的 Calico pod
```
 kubectl -n kube-system delete calico-node-gkjgl calico-node-mxhc5
```
Your pods will be re-created and hopefully initialize.您的 pod 将被重新创建并希望初始化。
Upgrade Calico to v3.1.3 or latest.将 Calico 升级到v3.1.3或最新版本。 Follow these docs My guess is that Heptio's Calico installation is using the etcd datastore.按照这些文档我的猜测是 Heptio 的 Calico 安装正在使用 etcd 数据存储。
Try to understand how Heptio's AWS AMIs work and see if there are any issues with them.尝试了解 Heptio 的 AWS AMI 的工作原理，看看它们是否存在任何问题。 This might take some time so you could contact their support as well.这可能需要一些时间，因此您也可以联系他们的支持。
Try a different method to install Kubernetes with Calico.尝试不同的方法来使用 Calico 安装 Kubernetes。 Well documented on https://kubernetes.io在https://kubernetes.io上有很好的记录

Answer 2

For me what worked was to remove left over docker-networks on the Nodes.对我来说，有效的是删除节点上剩余的 docker-networks。
I had to list out current networks on each Node: docker network list and then remove the unneeded ones: docker network rm <networkName> .我必须列出每个节点上的当前网络： docker network list ，然后删除不需要的网络： docker network rm <networkName> 。
After doing that the calico deployment pods were running fine之后，印花布部署吊舱运行良好

Kubernetes Calico 节点 'XXXXXXXXXXX' 已经在使用 IPv4 地址 XXXXXXXXX，CrashLoopBackOff

问题描述

2 个解决方案

解决方案1
2 已采纳 2018-09-18 16:33:45

解决方案2
0 2022-06-16 16:21:07

Kubernetes Calico 节点 &#39;XXXXXXXXXXX&#39; 已经在使用 IPv4 地址 XXXXXXXXX，CrashLoopBackOff

问题描述

2 个解决方案

解决方案1 2 已采纳 2018-09-18 16:33:45

解决方案2 0 2022-06-16 16:21:07

Kubernetes Calico 节点 'XXXXXXXXXXX' 已经在使用 IPv4 地址 XXXXXXXXX，CrashLoopBackOff

解决方案1
2 已采纳 2018-09-18 16:33:45

解决方案2
0 2022-06-16 16:21:07