简体   繁体   English

使用“对等名称冲突”在 Kubernetes 中设置 um WeaveNet 时出错

[英]Errors setting um WeaveNet in Kubernetes with “peer names collision”

i am setting up a Kubernetes-Cluser and can't get the weave network up properly.我正在设置 Kubernetes-Cluser,但无法正确建立编织网络。

I have 3 nodes: rowlf (master), rizzo and fozzie.我有 3 个节点:rowlf(主)、rizzo 和 fozzie。 The pods are looking fine:豆荚看起来不错:

NAMESPACE     NAME                                READY     STATUS    RESTARTS   AGE
kube-system   pod/etcd-rowlf                      1/1       Running   0          32m
kube-system   pod/kube-apiserver-rowlf            1/1       Running   9          33m
kube-system   pod/kube-controller-manager-rowlf   1/1       Running   0          32m
kube-system   pod/kube-dns-686d6fb9c-kjdxt        3/3       Running   0          33m
kube-system   pod/kube-proxy-6kpr9                1/1       Running   0          9m
kube-system   pod/kube-proxy-f7nk5                1/1       Running   0          33m
kube-system   pod/kube-proxy-nrbbl                1/1       Running   0          21m
kube-system   pod/kube-scheduler-rowlf            1/1       Running   0          32m
kube-system   pod/weave-net-4sj4n                 2/2       Running   1          21m
kube-system   pod/weave-net-kj6q7                 2/2       Running   1          9m
kube-system   pod/weave-net-nsp22                 2/2       Running   0          30m

But weave status showing up failures:但是编织状态显示失败:

$ kubectl exec -n kube-system weave-net-nsp22 -c weave -- /home/weave/weave --local status

Version: 2.3.0 (up to date; next check at 2018/06/14 00:30:09)

Service: router
Protocol: weave 1..2
Name: 7a:8f:22:1f:0a:17(rowlf)
Encryption: disabled
PeerDiscovery: enabled
Targets: 1
Connections: 1 (1 failed)
Peers: 1
TrustedSubnets: none

Service: ipam
Status: ready
Range: 10.32.0.0/12
DefaultSubnet: 10.32.0.0/12

First I do not undestand why the connection is marked as failed.首先,我不明白为什么连接被标记为失败。 Second in the logs i found these two lines:在日志中,我发现了这两行:

INFO: 2018/06/13 17:22:59.170536 ->[172.16.20.12:54077] connection accepted
INFO: 2018/06/13 17:22:59.480262 ->[172.16.20.12:54077|7a:8f:22:1f:0a:17(rowlf)]: connection shutting down due to error: local "7a:8f:22:1f:0a:17(rowlf)" and remote "7a:8f:22:1f:0a:17(rizzo)" peer names collision
INFO: 2018/06/13 17:34:12.668693 ->[172.16.20.13:52541] connection accepted
INFO: 2018/06/13 17:34:12.672113 ->[172.16.20.13:52541|7a:8f:22:1f:0a:17(rowlf)]: connection shutting down due to error: local "7a:8f:22:1f:0a:17(rowlf)" and remote "7a:8f:22:1f:0a:17(fozzie)" peer names collision

The second misunderstood thing is the "peer names collision" error.第二个被误解的事情是“对等名称冲突”错误。 Is this normal?这是正常的吗?

This is the log from "rizzo"这是来自“rizzo”的日志

kubectl logs weave-net-4sj4n -n kube-system weave

DEBU: 2018/06/13 17:22:58.731864 [kube-peers] Checking peer "7a:8f:22:1f:0a:17" against list &{[{7a:8f:22:1f:0a:17 rowlf}]}
INFO: 2018/06/13 17:22:58.833350 Command line options: map[conn-limit:100 docker-api: host-root:/host http-addr:127.0.0.1:6784 ipalloc-range:10.32.0.0/12 no-dns:true expect-npc:true name:7a:8f:22:1f:0a:17 datapath:datapath db-prefix:/weavedb/weave-net ipalloc-init:consensus=2 metrics-addr:0.0.0.0:6782 nickname:rizzo port:6783]
INFO: 2018/06/13 17:22:58.833525 weave  2.3.0
INFO: 2018/06/13 17:22:59.119956 Bridge type is bridged_fastdp
INFO: 2018/06/13 17:22:59.120025 Communication between peers is unencrypted.
INFO: 2018/06/13 17:22:59.141576 Our name is 7a:8f:22:1f:0a:17(rizzo)
INFO: 2018/06/13 17:22:59.141787 Launch detected - using supplied peer list: [172.16.20.12 172.16.20.11]
INFO: 2018/06/13 17:22:59.141894 Checking for pre-existing addresses on weave bridge
INFO: 2018/06/13 17:22:59.157517 [allocator 7a:8f:22:1f:0a:17] Initialising with persisted data
INFO: 2018/06/13 17:22:59.157884 Sniffing traffic on datapath (via ODP)
INFO: 2018/06/13 17:22:59.158806 ->[172.16.20.11:6783] attempting connection
INFO: 2018/06/13 17:22:59.159081 ->[172.16.20.12:6783] attempting connection
INFO: 2018/06/13 17:22:59.159815 ->[172.16.20.12:42371] connection accepted
INFO: 2018/06/13 17:22:59.161572 ->[172.16.20.12:6783|7a:8f:22:1f:0a:17(rizzo)]: connection shutting down due to error: cannot connect to ourself
INFO: 2018/06/13 17:22:59.161836 ->[172.16.20.12:42371|7a:8f:22:1f:0a:17(rizzo)]: connection shutting down due to error: cannot connect to ourself
INFO: 2018/06/13 17:22:59.265736 Listening for HTTP control messages on 127.0.0.1:6784
INFO: 2018/06/13 17:22:59.266483 Listening for metrics requests on 0.0.0.0:6782
INFO: 2018/06/13 17:22:59.443937 ->[172.16.20.11:6783|7a:8f:22:1f:0a:17(rizzo)]: connection shutting down due to error: local "7a:8f:22:1f:0a:17(rizzo)" and remote "7a:8f:22:1f:0a:17(rowlf)" peer names collision
INFO: 2018/06/13 17:23:00.355761 [kube-peers] Added myself to peer list &{[{7a:8f:22:1f:0a:17 rowlf}]}
DEBU: 2018/06/13 17:23:00.367309 [kube-peers] Nodes that have disappeared: map[]
INFO: 2018/06/13 17:34:12.671287 ->[172.16.20.13:60523] connection accepted
INFO: 2018/06/13 17:34:12.674712 ->[172.16.20.13:60523|7a:8f:22:1f:0a:17(rizzo)]: connection shutting down  due to error: local "7a:8f:22:1f:0a:17(rizzo)" and remote "7a:8f:22:1f:0a:17(fozzie)" peer names collision

I ask, because i reinstall everything from scratch the fourth time by now and every time I have some trouble to connect from traefik to a pod on another host.我问,因为我现在第四次从头开始重新安装所有东西,每次我在从 traefik 连接到另一台主机上的 pod 时遇到一些麻烦。 I blame the network, because this does not look health.我责怪网络,因为这看起来不健康。 Can you please tell me if the setup is correct so far.你能告诉我到目前为止设置是否正确。 Are the errors normal or do I have to care about them?错误是正常的还是我必须关心它们? And finally: how do I request for help and what information do I have to provide to make it some people like you easy to help me out of this frustrating position?最后:我如何请求帮助以及我必须提供哪些信息才能让像您这样的人轻松帮助我摆脱这个令人沮丧的境地?

This is my version:这是我的版本:

Client Version: version.Info{Major:"1", Minor:"10", GitVersion:"v1.10.2", GitCommit:"81753b10df112992bf51bbc2c2f85208aad78335", GitTreeState:"clean", BuildDate:"2018-04-27T09:22:21Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"linux/arm"}
Server Version: version.Info{Major:"1", Minor:"10", GitVersion:"v1.10.4", GitCommit:"5ca598b4ba5abb89bb773071ce452e33fb66339d", GitTreeState:"clean", BuildDate:"2018-06-06T08:00:59Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"linux/arm"}

Thank you.谢谢你。

++++ UPDATE ++++ I reset the machine-id like it is mentiond in here: https://github.com/weaveworks/weave/issues/2767 But this cause a constantly reboot of my machines! ++++ 更新 ++++ 我像这里提到的那样重置机器 ID: https : //github.com/weaveworks/weave/issues/2767但这会导致我的机器不断重启!

kernel:[ 2257.674153] Internal error: Oops: 80000007 [#1] SMP ARM

最后我在这里找到了解决方案: https : //github.com/weaveworks/weave/issues/3314我们必须禁用fastDP!

I had this same issue, disabling fastDP wouldn't work for me but I found out the cause was the nodes all had the same value for /etc/machine-id as a result of me cloning them from the same OS image.我遇到了同样的问题,禁用 fastDP 对我不起作用,但我发现原因是由于我从同一操作系统映像克隆它们,所有节点的/etc/machine-id值都相同。

I deleted the machine IDs from all of the nodes and generated new ones with the following commands:我从所有节点中删除了机器 ID,并使用以下命令生成了新的机器 ID:

sudo rm /etc/machine-id
sudo systemd-machine-id-setup

and then reset my cluster然后重置我的集群

'sudo systemd-machine-id-setup' generates the machine-id as the old one for me. 'sudo systemd-machine-id-setup' 为我生成机器 ID 作为旧机器 ID。 I just edit the machine-id and it works.我只是编辑机器 ID 并且它可以工作。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM