简体   繁体   English

无法使用覆盖网络在另一台主机上 ping docker 容器

[英]Cannot ping docker container on another host using an overlay network

This question has been asked many many times before on all types of fora but unfortunately, none of the answers have helped me so far.这个问题以前在所有类型的论坛上被问过很多次,但不幸的是,到目前为止,没有一个答案对我有帮助。

I will get right to it.我会马上解决的。

 OS: RHEL 7.7 Maipo
 Docker Version: Engline/Client 18.09.7

My configuration:

Host 1: IP 169.192.215.74
Host 2: IP 10.210.87.16

**On Host1:**

# netstat -plntu|grep -E "4789|7946|2377"
tcp6       0      0 :::2377                 :::*                    LISTEN      3576/dockerd        
tcp6       0      0 :::7946                 :::*                    LISTEN      3576/dockerd        
udp        0      0 0.0.0.0:4789            0.0.0.0:*                           -                   
udp6       0      0 :::7946                 :::*                                3576/dockerd

# docker swarm init --advertise-addr=169.192.215.74

# docker network create --attachable --driver overlay syndichain

# docker swarm join-token manager


docker swarm join --token SWMTKN-1-2rxp0qlhl8o5n1hlmf0umcj8jdub19t07ndbu7mlodeb1yi4uf-ceopcr39eaaiz6bfqd4kk6ojs 169.192.215.74:2377

Ping Host 2:

# ping 10.210.87.16
PING 10.210.87.16 (10.210.87.16) 56(84) bytes of data.
64 bytes from 10.210.87.16: icmp_seq=1 ttl=119 time=0.804 ms
64 bytes from 10.210.87.16: icmp_seq=2 ttl=119 time=1.49 ms
64 bytes from 10.210.87.16: icmp_seq=3 ttl=119 time=0.646 ms
64 bytes from 10.210.87.16: icmp_seq=4 ttl=119 time=0.664 ms
^C
--- 10.210.87.16 ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3003ms
rtt min/avg/max/mdev = 0.646/0.901/1.493/0.348 ms


# ip address show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
2: ens192: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether 00:50:56:8d:a0:60 brd ff:ff:ff:ff:ff:ff
    inet 169.192.215.74/24 brd 169.192.215.255 scope global noprefixroute ens192
       valid_lft forever preferred_lft forever
4: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default 
    link/ether 02:42:a9:99:bf:c2 brd ff:ff:ff:ff:ff:ff
    inet 172.17.0.1/16 brd 172.17.255.255 scope global docker0
       valid_lft forever preferred_lft forever
76: docker_gwbridge: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default 
    link/ether 02:42:ab:17:1e:2f brd ff:ff:ff:ff:ff:ff
    inet 172.19.0.1/16 brd 172.19.255.255 scope global docker_gwbridge
       valid_lft forever preferred_lft forever
468: veth49ae88e@if467: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master docker_gwbridge state UP group default 
    link/ether 82:e1:f6:1e:72:ec brd ff:ff:ff:ff:ff:ff link-netnsid 1


Bring up two containers (RHEL 7 container from our local repository)

# docker run --rm -it --network="syndichain" -p 11002:11002 --name rhel.da85 957c8834c8a9 bash

# docker run --rm -it --network="syndichain" -p 11003:11003 --name rhel.da85.2 957c8834c8a9 bash

On Host 2:在主机 2 上:

    # docker swarm join --token SWMTKN-1-2rxp0qlhl8o5n1hlmf0umcj8jdub19t07ndbu7mlodeb1yi4uf-ceopcr39eaaiz6bfqd4kk6ojs 169.192.215.74:2377 --advertise-addr=10.210.87.216

    # docker node ls
    ID                            HOSTNAME                      STATUS              AVAILABILITY        MANAGER STATUS      ENGINE VERSION
    yxphc861xra6ajc5llxrpp9ia *   sd-cfd1-0319   Ready               Active              Reachable           18.09.7
    2c45co3pj38u2gwqrvuawwi11     sd-d5d8-da85   Ready               Active              Leader              18.09.7

# netstat -plntu|grep -E "4789|7946|2377"
tcp6       0      0 :::2377                 :::*                    LISTEN      8562/dockerd        
tcp6       0      0 :::7946                 :::*                    LISTEN      8562/dockerd        
udp        0      0 0.0.0.0:4789            0.0.0.0:*                           -                   
udp6       0      0 :::7946                 :::*                                8562/dockerd   

Ping Host 1:

# ping 169.192.215.74
PING 169.192.215.74 (169.192.215.74) 56(84) bytes of data.
64 bytes from 169.192.215.74: icmp_seq=1 ttl=55 time=0.796 ms
64 bytes from 169.192.215.74: icmp_seq=2 ttl=55 time=0.530 ms
64 bytes from 169.192.215.74: icmp_seq=3 ttl=55 time=0.513 ms
^C
--- 169.192.215.74 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2000ms
rtt min/avg/max/mdev = 0.513/0.613/0.796/0.129 ms


# ip address show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
2: ens192: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether 00:50:56:98:63:cb brd ff:ff:ff:ff:ff:ff
    inet 10.210.87.216/23 brd 10.210.87.255 scope global noprefixroute ens192
       valid_lft forever preferred_lft forever
3: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default 
    link/ether 02:42:20:4a:de:3f brd ff:ff:ff:ff:ff:ff
    inet 172.17.0.1/16 brd 172.17.255.255 scope global docker0
       valid_lft forever preferred_lft forever
871: docker_gwbridge: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default 
    link/ether 02:42:f0:d9:46:4c brd ff:ff:ff:ff:ff:ff
    inet 172.23.0.1/16 brd 172.23.255.255 scope global docker_gwbridge
       valid_lft forever preferred_lft forever
885: veth4d635e9@if884: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master docker_gwbridge state UP group default 
    link/ether 86:4f:5b:81:89:5c brd ff:ff:ff:ff:ff:ff link-netnsid 1
892: veth2734e0c@if891: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master docker_gwbridge state UP group default 
    link/ether b2:6c:1d:94:99:3f brd ff:ff:ff:ff:ff:ff link-netnsid 4


Bring up two RHEL7 containers on Host 2:

# docker run --rm -it --network="syndichain" --name rhel.0319 -p 11001:11001 957c8834c8a9 bash

# docker run --rm -it --network="syndichain" -p 11002:11002 --name rhel.0319.2 957c8834c8a9 bash

Now, we run some commands to check our overlay network:现在,我们运行一些命令来检查我们的覆盖网络:

On Host 1:

# docker network inspect syndichain
[
    {
        "Name": "syndichain",
        "Id": "8ar2ar0z0brl5lp1lk13xo4ik",
        "Created": "2020-02-28T11:44:01.411177608-05:00",
        "Scope": "swarm",
        "Driver": "overlay",
        "EnableIPv6": false,
        "IPAM": {
            "Driver": "default",
            "Options": null,
            "Config": [
                {
                    "Subnet": "10.0.0.0/24",
                    "Gateway": "10.0.0.1"
                }
            ]
        },
        "Internal": false,
        "Attachable": true,
        "Ingress": false,
        "ConfigFrom": {
            "Network": ""
        },
        "ConfigOnly": false,
        "Containers": {
            "fb911a7538d05f116d7f3ed50192e198d93faadaf1fb98f55fa79eb4cfbea72d": {
                "Name": "rhel.da85",
                "EndpointID": "8caf50c904f660c750bfa3608902d78239211e7076afed68d77a46e3e141386d",
                "MacAddress": "02:42:0a:00:00:02",
                "IPv4Address": "10.0.0.2/24",
                "IPv6Address": ""
            },
            "lb-syndichain": {
                "Name": "syndichain-endpoint",
                "EndpointID": "0a122599aee9d9b84dde6383aeb432580769af886278b46310a1c65bc68a0bc9",
                "MacAddress": "02:42:0a:00:00:03",
                "IPv4Address": "10.0.0.3/24",
                "IPv6Address": ""
            }
        },
        "Options": {
            "com.docker.network.driver.overlay.vxlanid_list": "4097"
        },
        "Labels": {},
        "Peers": [
            {
                "Name": "7f037ddf9a5f",
                "IP": "10.210.87.216"
            },
            {
                "Name": "b1b424f73d55",
                "IP": "169.192.215.74"
            }
        ]
    }
]

On Host 2:

# docker network inspect syndichain
[
    {
        "Name": "syndichain",
        "Id": "8ar2ar0z0brl5lp1lk13xo4ik",
        "Created": "2020-02-28T11:42:08.565999966-05:00",
        "Scope": "swarm",
        "Driver": "overlay",
        "EnableIPv6": false,
        "IPAM": {
            "Driver": "default",
            "Options": null,
            "Config": [
                {
                    "Subnet": "10.0.0.0/24",
                    "Gateway": "10.0.0.1"
                }
            ]
        },
        "Internal": false,
        "Attachable": true,
        "Ingress": false,
        "ConfigFrom": {
            "Network": ""
        },
        "ConfigOnly": false,
        "Containers": {
            "351d29af85e6e8addafcf69b3c464b49f8c9720cbaf8911328ab1aadce13e170": {
                "Name": "rhel.0319.2",
                "EndpointID": "fe0ed40c7a0b8687ca4a34d7b0196e382c32f3a5f82c43233174da349a8a5b34",
                "MacAddress": "02:42:0a:00:00:04",
                "IPv4Address": "10.0.0.4/24",
                "IPv6Address": ""
            },
            "62493b5072d9c2e516a0b8bfb9d0ba6dea9c2cbca2787bf4f8c2bf5ffe021dd0": {
                "Name": "rhel.0319",
                "EndpointID": "9a5e8977ca434854da886c215f14583cc5a7c5bb8811d5408a5cae9c4a325c47",
                "MacAddress": "02:42:0a:00:00:0e",
                "IPv4Address": "10.0.0.14/24",
                "IPv6Address": ""
            },
            "lb-syndichain": {
                "Name": "syndichain-endpoint",
                "EndpointID": "37415feb73f86cc3ceb1ab4d060da076ed4fca1f0b3ebe440993c82c14258a2d",
                "MacAddress": "02:42:0a:00:00:0f",
                "IPv4Address": "10.0.0.15/24",
                "IPv6Address": ""
            }
        },
        "Options": {
            "com.docker.network.driver.overlay.vxlanid_list": "4097"
        },
        "Labels": {},
        "Peers": [
            {
                "Name": "7f037ddf9a5f",
                "IP": "10.210.87.216"
            },
            {
                "Name": "b1b424f73d55",
                "IP": "169.192.215.74"
            }
        ]
    }
]

Now, I will demonstrate the problem:现在,我将演示这个问题:

Ping rhel.da85.2 container from rhel.da85 container (both containers on the same host)

[root@fb911a7538d0 /]# ping rhel.da85.2
PING rhel.da85.2 (10.0.0.5) 56(84) bytes of data.
64 bytes from rhel.da85.2.syndichain (10.0.0.5): icmp_seq=1 ttl=64 time=0.080 ms
64 bytes from rhel.da85.2.syndichain (10.0.0.5): icmp_seq=2 ttl=64 time=0.074 ms
64 bytes from rhel.da85.2.syndichain (10.0.0.5): icmp_seq=3 ttl=64 time=0.048 ms
^C
--- rhel.da85.2 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 1999ms
rtt min/avg/max/mdev = 0.048/0.067/0.080/0.015 ms

Ping rhel.0319 container from rhel.da85 container (containers on different hosts)

[root@fb911a7538d0 /]# ping rhel.0319
PING rhel.0319 (10.0.0.14) 56(84) bytes of data.

It is stuck at this. Nothing happens.


Ping rhel.0319.2 container from rhel.0319 container (both containers on the same host)

[root@62493b5072d9 /]# ping rhel.0319.2
PING rhel.0319.2 (10.0.0.4) 56(84) bytes of data.
64 bytes from rhel.0319.2.syndichain (10.0.0.4): icmp_seq=1 ttl=64 time=0.109 ms
64 bytes from rhel.0319.2.syndichain (10.0.0.4): icmp_seq=2 ttl=64 time=0.061 ms
64 bytes from rhel.0319.2.syndichain (10.0.0.4): icmp_seq=3 ttl=64 time=0.064 ms
64 bytes from rhel.0319.2.syndichain (10.0.0.4): icmp_seq=4 ttl=64 time=0.068 ms
64 bytes from rhel.0319.2.syndichain (10.0.0.4): icmp_seq=5 ttl=64 time=0.064 ms
64 bytes from rhel.0319.2.syndichain (10.0.0.4): icmp_seq=6 ttl=64 time=0.063 ms
^C
--- rhel.0319.2 ping statistics ---
6 packets transmitted, 6 received, 0% packet loss, time 5000ms
rtt min/avg/max/mdev = 0.061/0.071/0.109/0.018 ms


Ping rhel.da85 container from rhel.0319 container (both containers on different hosts)

[root@62493b5072d9 /]# ping rhel.da85  
PING rhel.da85 (10.0.0.2) 56(84) bytes of data.


It is stuck at this point, just like it is stuck pinging a container on 0319 from da85.

I have even explicitly enabled ip4 forwarding as per https://stackoverflow.com/a/41453306/10382340我什至根据https://stackoverflow.com/a/41453306/10382340明确启用了 ip4 转发

However, still stuck with this issue for the past week.但是,过去一周仍然坚持这个问题。 Appreciate any help.感谢任何帮助。

Update :更新

Someone suggested running tcpdump and nsenter.有人建议运行 tcpdump 和 nsenter。 Here are the logs from the tcpdump running at the container level (on the same VM where the PING request is being generated) and nsenter logs for "ARP".以下是在容器级别运行的 tcpdump 日志(在生成 PING 请求的同一 VM 上)和“ARP”的 nsenter 日志。 You can see that the ARP request never gets a response in tcpdump but nsenter shows request/reply pairs for ARP.您可以看到 ARP 请求从未在 tcpdump 中得到响应,但 nsenter 显示 ARP 的请求/回复对。

[root@0ae70bfa66ae /]# tcpdump -v
tcpdump: listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
16:26:03.276397 IP (tos 0x0, ttl 64, id 34495, offset 0, flags [DF], proto ICMP (1), length 84)
    rhel.0319.syndichain > rhel.da85.syndichain: ICMP echo request, id 62, seq 1, length 64
16:26:04.276569 IP (tos 0x0, ttl 64, id 34971, offset 0, flags [DF], proto ICMP (1), length 84)
    rhel.0319.syndichain > rhel.da85.syndichain: ICMP echo request, id 62, seq 2, length 64
16:26:05.276526 IP (tos 0x0, ttl 64, id 35636, offset 0, flags [DF], proto ICMP (1), length 84)
    rhel.0319.syndichain > rhel.da85.syndichain: ICMP echo request, id 62, seq 3, length 64
16:26:06.276545 IP (tos 0x0, ttl 64, id 35935, offset 0, flags [DF], proto ICMP (1), length 84)
    rhel.0319.syndichain > rhel.da85.syndichain: ICMP echo request, id 62, seq 4, length 64
16:26:07.276542 IP (tos 0x0, ttl 64, id 36706, offset 0, flags [DF], proto ICMP (1), length 84)
    rhel.0319.syndichain > rhel.da85.syndichain: ICMP echo request, id 62, seq 5, length 64
16:26:08.276543 IP (tos 0x0, ttl 64, id 37284, offset 0, flags [DF], proto ICMP (1), length 84)
    rhel.0319.syndichain > rhel.da85.syndichain: ICMP echo request, id 62, seq 6, length 64
16:26:08.294475 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has rhel.da85.syndichain tell rhel.0319.syndichain, length 28




# nsenter --net=/var/run/docker/netns/1-khyguu9i6n tcpdump -peni any "arp"

tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on any, link-type LINUX_SLL (Linux cooked), capture size 262144 bytes
11:45:23.302460   P 02:42:0a:00:00:0b ethertype ARP (0x0806), length 44: Request who-has 10.0.0.2 tell 10.0.0.11, length 28
11:45:23.302471 Out 02:42:0a:00:00:0b ethertype ARP (0x0806), length 44: Request who-has 10.0.0.2 tell 10.0.0.11, length 28
11:45:23.302485  In 02:42:0a:00:00:02 ethertype ARP (0x0806), length 44: Reply 10.0.0.2 is-at 02:42:0a:00:00:02, length 28
11:45:23.302488 Out 02:42:0a:00:00:02 ethertype ARP (0x0806), length 44: Reply 10.0.0.2 is-at 02:42:0a:00:00:02, length 28
11:45:44.294458   P 02:42:0a:00:00:0b ethertype ARP (0x0806), length 44: Request who-has 10.0.0.2 tell 10.0.0.11, length 28
11:45:44.294474 Out 02:42:0a:00:00:0b ethertype ARP (0x0806), length 44: Request who-has 10.0.0.2 tell 10.0.0.11, length 28
11:45:44.294477  In 02:42:0a:00:00:02 ethertype ARP (0x0806), length 44: Reply 10.0.0.2 is-at 02:42:0a:00:00:02, length 28
11:45:44.294479 Out 02:42:0a:00:00:02 ethertype ARP (0x0806), length 44: Reply 10.0.0.2 is-at 02:42:0a:00:00:02, length 28
11:46:05.302460   P 02:42:0a:00:00:0b ethertype ARP (0x0806), length 44: Request who-has 10.0.0.2 tell 10.0.0.11, length 28
11:46:05.302480 Out 02:42:0a:00:00:0b ethertype ARP (0x0806), length 44: Request who-has 10.0.0.2 tell 10.0.0.11, length 28
11:46:05.302483  In 02:42:0a:00:00:02 ethertype ARP (0x0806), length 44: Reply 10.0.0.2 is-at 02:42:0a:00:00:02, length 28
11:46:05.302485 Out 02:42:0a:00:00:02 ethertype ARP (0x0806), length 44: Reply 10.0.0.2 is-at 02:42:0a:00:00:02, length 28

Update 2:更新 2:

From container on Source Host

[root@2baddaa98452 /]# nc -zvu rhel.da85 4789
Ncat: Version 7.50 ( https://nmap.org/ncat )
Ncat: Connected to 10.0.0.2:4789.
Ncat: UDP packet sent successfully
Ncat: 1 bytes sent, 0 bytes received in 2.02 seconds.

Source Host TCP Dump

# tcpdump -i ens192 port 4789
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on ens192, link-type EN10MB (Ethernet), capture size 262144 bytes
11:28:11.706587 IP sd-cfd1-0319.59121 > sd-d5d8-da85.4789: VXLAN, flags [I] (0x08), vni 4097
IP 10.0.0.11.44806 > 10.0.0.2.4789:  [|VXLAN]

Target Host TCP Dump

# tcpdump -i ens192 port 4789
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on ens192, link-type EN10MB (Ethernet), capture size 262144 bytes

From your comment, it appears that the overlay networking ports are blocked between nodes.根据您的评论,似乎覆盖网络端口在节点之间被阻止。 This could be from iptables or another firewall tool on the host, a network firewall between the nodes, or other software like VM tooling or a cloud router ACL, blocking those ports.这可能来自主机上的 iptables 或其他防火墙工具、节点之间的网络防火墙或其他软件(如 VM 工具或云路由器 ACL),从而阻止了这些端口。 The ports that need to be opened are:需要开放的端口有:

  • TCP and UDP port 7946 for communication among nodes用于节点间通信的 TCP 和 UDP 端口 7946
  • UDP port 4789 for overlay network traffic UDP 端口 4789 用于覆盖网络流量

Ref: https://docs.docker.com/network/overlay/参考: https : //docs.docker.com/network/overlay/

如果你想使用覆盖网络并跨节点通信,那么部署为服务,而不是作为一个独立的容器(又名docker run )。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM