简体   繁体   中英

Docker-swarm overlay network is not working for containers in different hosts

We have a.networking problem in docker-swarm. The problem is below;

  • we have virtualized environment over wmware ( vsphere 6.02)
  • our servers are created from vmware say server1 and server2
  • we have a docker compose file defining a couple of services
  • we have an overlay.network definition within docker-compose for docker-swarm
  • when we deploy system using docker-swarm deployment is finished successfully, all containers gets ip from overlay.network range.
  • But the problem is if 2 containers (say cnt1 and cnt2) are deployed to different servers they can not ping each other
  • I check tcpdump and see that ARP communication is successfull so they know each other mac correctly
  • But when you try to ping to container, ICMP Echo messages are send but are not delivered to second machine..

Where should I check, any advices?

    server-1:~$ docker version
    Client:
     Version:      17.03.0-ce
     API version:  1.26
     Go version:   go1.7.5
     Git commit:   3a232c8
     Built:        Tue Feb 28 08:01:32 2017
     OS/Arch:      linux/amd64

    Server:
     Version:      17.03.0-ce
     API version:  1.26 (minimum version 1.12)
     Go version:   go1.7.5
     Git commit:   3a232c8
     Built:        Tue Feb 28 08:01:32 2017
     OS/Arch:      linux/amd64
     Experimental: true

ps: I checked this post but I have latest version of docker / docker-swarm so the issue should be fixed..

ps-2: similar problem; https://github.com/docker/swarm/issues/2687

"VTEP Port is reserved or restricted for VMware use, any virtual machine cannot use this port for other purpose or for any other application."

But we can change docker swarm data-path-port(the default port number 4789 is used) to another:

docker swarm init --data-path-port=7789

Out of curiosity, in your VMware environment, do you have NSX deployed? I may have an answer, but it only applies if NSX is deployed in the environment.

ESXi will apparently drop OUTBOUND packets from VMs if the destination port is the same as the port configured for the VXLAN VTEP communication.

NSX utilizes port 4789/udp for VTEP communication for VXLAN (by default, as of 6.2.3; prior to that, it was 8472/udp ). (If the VMs are on the same host, then traffic is not dropped, because, while it may be OUTBOUND traffic, it does not egress the host, and does not get to the same stage within the VMKernel to be dropped.)

The wording in KB2079386 is a little off. It states:

VXLAN port 8472 is reserved or restricted for VMware use, any virtual machine cannot use this port for other purpose or for any other application.

But, it should read:

VTEP Port is reserved or restricted for VMware use, any virtual machine cannot use this port for other purpose or for any other application.

If you are using NSX, you could try changing the port used for the VXLAN VTEPs, but port 4789/udp is required if you are going to leverage hardware VTEPs at all.

(I can't take full credit for this. I stumbled across this blog post talking about similar behavior when troubleshooting a similar issue.)

The first thing I would check for overlay.networking is your firewall rules. You need the following open between the hosts:

  • The swarm port, usually 2377/tcp, this is most likely already done
  • The overlay control port 7946/tcp and 7946/udp
  • The overlay data port 4789/udp
  • The IPSEC protocol 50 if your overlay.networks are defined as "secure" (that's a protocol, not a port, so iptables -A INPUT -p 50 -j ACCEPT )

If that doesn't help, look into using netshoot to debug where the traffic is getting stopped.

If your nodes are not on the same su.net (eg. they all have public IPs) - then make sure you use the --advertise-addr option specifying the IP address that the other nodes can reach when that node (other managers AND workers) joins the swarm.

Otherwise the overlay.network will not route correctly between hosts even though stack deployment & node registration etc appear to be working fine.

See the detailed explanation for my case in the same GitHub issue --> https://github.com/docker/swarm/issues/2687

Resolution to the issue as mentioned above.
Use the following when you initializing the swarm

docker swarm init --advertise-addr=YOURIP --listen-addr=0.0.0.0 --data-path-port=7779 --force-new-cluster=true

Resources :

Docker:

VMWare:

Thanks @Izkuru

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM