简体   繁体   English

Docker-swarm overlay.network 不适用于不同主机中的容器

[英]Docker-swarm overlay network is not working for containers in different hosts

We have a.networking problem in docker-swarm.我们在 docker-swarm 中有一个网络问题。 The problem is below;问题在下面;

  • we have virtualized environment over wmware ( vsphere 6.02)我们通过 wmware ( vsphere 6.02) 虚拟化了环境
  • our servers are created from vmware say server1 and server2我们的服务器是从 vmware 创建的,比如 server1 和 server2
  • we have a docker compose file defining a couple of services我们有一个 docker 组合文件定义了几个服务
  • we have an overlay.network definition within docker-compose for docker-swarm我们在 docker-compose 中为 docker-swarm 定义了一个 overlay.network
  • when we deploy system using docker-swarm deployment is finished successfully, all containers gets ip from overlay.network range.当我们使用 docker-swarm 部署系统部署成功完成时,所有容器从 overlay.network 范围获得 ip。
  • But the problem is if 2 containers (say cnt1 and cnt2) are deployed to different servers they can not ping each other但问题是,如果将 2 个容器(比如 cnt1 和 cnt2)部署到不同的服务器,它们就无法相互 ping 通
  • I check tcpdump and see that ARP communication is successfull so they know each other mac correctly我检查了 tcpdump,看到 ARP 通信成功,所以他们正确地知道彼此的 mac
  • But when you try to ping to container, ICMP Echo messages are send but are not delivered to second machine..但是当你尝试 ping 到容器时,ICMP Echo 消息被发送但没有传递到第二台机器..

Where should I check, any advices?我应该在哪里检查,有什么建议吗?

    server-1:~$ docker version
    Client:
     Version:      17.03.0-ce
     API version:  1.26
     Go version:   go1.7.5
     Git commit:   3a232c8
     Built:        Tue Feb 28 08:01:32 2017
     OS/Arch:      linux/amd64

    Server:
     Version:      17.03.0-ce
     API version:  1.26 (minimum version 1.12)
     Go version:   go1.7.5
     Git commit:   3a232c8
     Built:        Tue Feb 28 08:01:32 2017
     OS/Arch:      linux/amd64
     Experimental: true

ps: I checked this post but I have latest version of docker / docker-swarm so the issue should be fixed.. ps:我检查了这篇文章,但我有最新版本的 docker / docker-swarm 所以这个问题应该是固定的..

ps-2: similar problem; ps-2:类似问题; https://github.com/docker/swarm/issues/2687 https://github.com/docker/swarm/issues/2687

"VTEP Port is reserved or restricted for VMware use, any virtual machine cannot use this port for other purpose or for any other application." “VTEP 端口保留或限制供 VMware 使用,任何虚拟机都不能将此端口用于其他目的或任何其他应用程序。”

But we can change docker swarm data-path-port(the default port number 4789 is used) to another:但我们可以将 docker swarm data-path-port(使用默认端口号 4789)更改为另一个:

docker swarm init --data-path-port=7789

Out of curiosity, in your VMware environment, do you have NSX deployed?出于好奇,在您的 VMware 环境中,您是否部署了 NSX? I may have an answer, but it only applies if NSX is deployed in the environment.我可能有答案,但它仅适用于在环境中部署了 NSX 的情况。

ESXi will apparently drop OUTBOUND packets from VMs if the destination port is the same as the port configured for the VXLAN VTEP communication.如果目标端口与为 VXLAN VTEP通信配置的端口相同,ESXi 显然会丢弃来自虚拟机的出站数据包。

NSX utilizes port 4789/udp for VTEP communication for VXLAN (by default, as of 6.2.3; prior to that, it was 8472/udp ). NSX 使用端口4789/udp进行 VXLAN 的 VTEP 通信(默认情况下,从 6.2.3 开始;在此之前,它是8472/udp )。 (If the VMs are on the same host, then traffic is not dropped, because, while it may be OUTBOUND traffic, it does not egress the host, and does not get to the same stage within the VMKernel to be dropped.) (如果 VM 在同一台主机上,则流量不会被丢弃,因为虽然它可能是出流量,但它不会离开主机,也不会到达要丢弃的 VMKernel 中的同一阶段。)

The wording in KB2079386 is a little off. KB2079386中的措辞有点偏离。 It states:它指出:

VXLAN port 8472 is reserved or restricted for VMware use, any virtual machine cannot use this port for other purpose or for any other application. VXLAN 端口 8472 保留或限制供 VMware 使用,任何虚拟机都不能将此端口用于其他目的或任何其他应用程序。

But, it should read:但是,它应该是:

VTEP Port is reserved or restricted for VMware use, any virtual machine cannot use this port for other purpose or for any other application. VTEP 端口保留或限制供 VMware 使用,任何虚拟机都不能将此端口用于其他目的或任何其他应用程序。

If you are using NSX, you could try changing the port used for the VXLAN VTEPs, but port 4789/udp is required if you are going to leverage hardware VTEPs at all.如果您使用的是 NSX,您可以尝试更改用于 VXLAN VTEP 的端口,但如果您要完全利用硬件 VTEP,则需要端口4789/udp

(I can't take full credit for this. I stumbled across this blog post talking about similar behavior when troubleshooting a similar issue.) (我不能完全相信这一点。我在解决类似问题时偶然发现了这篇谈论类似行为的博客文章。)

The first thing I would check for overlay.networking is your firewall rules.我要检查 overlay.networking 的第一件事是你的防火墙规则。 You need the following open between the hosts:您需要在主机之间打开以下内容:

  • The swarm port, usually 2377/tcp, this is most likely already done swarm 端口,通常是 2377/tcp,这很可能已经完成
  • The overlay control port 7946/tcp and 7946/udp覆盖控制端口 7946/tcp 和 7946/udp
  • The overlay data port 4789/udp覆盖数据端口 4789/udp
  • The IPSEC protocol 50 if your overlay.networks are defined as "secure" (that's a protocol, not a port, so iptables -A INPUT -p 50 -j ACCEPT ) IPSEC 协议 50 如果您的 overlay.networks 被定义为“安全”(这是一个协议,而不是一个端口,所以iptables -A INPUT -p 50 -j ACCEPT

If that doesn't help, look into using netshoot to debug where the traffic is getting stopped.如果这没有帮助,请考虑使用netshoot调试流量停止的位置。

If your nodes are not on the same su.net (eg. they all have public IPs) - then make sure you use the --advertise-addr option specifying the IP address that the other nodes can reach when that node (other managers AND workers) joins the swarm.如果您的节点不在同一个 su.net 上(例如,它们都有公共 IP)——那么请确保您使用--advertise-addr选项指定其他节点在该节点(其他管理器和工人)加入蜂群。

Otherwise the overlay.network will not route correctly between hosts even though stack deployment & node registration etc appear to be working fine.否则 overlay.network 将无法在主机之间正确路由,即使堆栈部署和节点注册等看起来工作正常。

See the detailed explanation for my case in the same GitHub issue --> https://github.com/docker/swarm/issues/2687在同一个 GitHub 问题中查看我的案例的详细解释 --> https://github.com/docker/swarm/issues/2687

Resolution to the issue as mentioned above.解决上述问题。
Use the following when you initializing the swarm初始化群时使用以下内容

docker swarm init --advertise-addr=YOURIP --listen-addr=0.0.0.0 --data-path-port=7779 --force-new-cluster=true

Resources :资源

Docker: Docker:

VMWare:虚拟机:

Thanks @Izkuru谢谢@Izkuru

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM