[英]Why Docker containers can't communicate with each other?
I have created a small project to test Docker clustering. 我创建了一个小项目来测试Docker集群。 Basically, the cluster.sh script launches three identical containers, and uses pipework to configure a bridge (
bridge1
) on the host and add an NIC ( eth1
) to each container. 基本上,cluster.sh脚本启动三个相同的容器,并使用管道配置主机上的桥(
bridge1
)并向每个容器添加NIC( eth1
)。
If I log into one of the containers, I can arping
other containers: 如果我登录其中一个容器,我可以
arping
其他容器:
# 172.17.99.1
root@d01eb56fce52:/# arping 172.17.99.2
ARPING 172.17.99.2
42 bytes from aa:b3:98:92:0b:08 (172.17.99.2): index=0 time=1.001 sec
42 bytes from aa:b3:98:92:0b:08 (172.17.99.2): index=1 time=1.001 sec
42 bytes from aa:b3:98:92:0b:08 (172.17.99.2): index=2 time=1.001 sec
42 bytes from aa:b3:98:92:0b:08 (172.17.99.2): index=3 time=1.001 sec
^C
--- 172.17.99.2 statistics ---
5 packets transmitted, 4 packets received, 20% unanswered (0 extra)
So it seems packets can go through bridge1
. 所以似乎数据包可以通过
bridge1
。
But the problem is I can't ping
other containers, neither can I send any IP packets through via any tools like telnet
or netcat
. 但问题是我无法
ping
其他容器,也无法通过telnet
或netcat
等任何工具发送任何IP数据包。
In contrast, the bridge docker0
and NIC eth0
work correctly in all containers. 相反,桥接器
docker0
和NIC eth0
在所有容器中都能正常工作。
Here's my route table 这是我的路线表
# 172.17.99.1
root@d01eb56fce52:/# ip route
default via 172.17.42.1 dev eth0
172.17.0.0/16 dev eth0 proto kernel scope link src 172.17.0.17
172.17.99.0/24 dev eth1 proto kernel scope link src 172.17.99.1
and bridge config 和桥配置
# host
$ brctl show
bridge name bridge id STP enabled interfaces
bridge1 8000.8a6b21e27ae6 no veth1pl25432
veth1pl25587
veth1pl25753
docker0 8000.56847afe9799 no veth7c87801
veth953a086
vethe575fe2
# host
$ brctl showmacs bridge1
port no mac addr is local? ageing timer
1 8a:6b:21:e2:7a:e6 yes 0.00
2 8a:a3:b8:90:f3:52 yes 0.00
3 f6:0c:c4:3d:f5:b2 yes 0.00
# host
$ ifconfig
bridge1 Link encap:Ethernet HWaddr 8a:6b:21:e2:7a:e6
inet6 addr: fe80::48e9:e3ff:fedb:a1b6/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:163 errors:0 dropped:0 overruns:0 frame:0
TX packets:68 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:8844 (8.8 KB) TX bytes:12833 (12.8 KB)
# I'm showing only one veth here for simplicity
veth1pl25432 Link encap:Ethernet HWaddr 8a:6b:21:e2:7a:e6
inet6 addr: fe80::886b:21ff:fee2:7ae6/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:155 errors:0 dropped:0 overruns:0 frame:0
TX packets:162 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:12366 (12.3 KB) TX bytes:23180 (23.1 KB)
...
and IP FORWARD chain 和IP FORWARD链
# host
$ sudo iptables -x -v --line-numbers -L FORWARD
Chain FORWARD (policy ACCEPT 10675 packets, 640500 bytes)
num pkts bytes target prot opt in out source destination
1 15018 22400195 DOCKER all -- any docker0 anywhere anywhere
2 15007 22399271 ACCEPT all -- any docker0 anywhere anywhere ctstate RELATED,ESTABLISHED
3 8160 445331 ACCEPT all -- docker0 !docker0 anywhere anywhere
4 11 924 ACCEPT all -- docker0 docker0 anywhere anywhere
5 56 4704 ACCEPT all -- bridge1 bridge1 anywhere anywhere
Note the pkts cound for rule 5 isn't 0, which means ping
has been routed correctly (FORWARD chain is executed after routing right?), but somehow didn't reach the destination. 注意规则5的pkts cound不是0,这意味着
ping
已经正确路由(在路由正确后执行FORWARD链?),但不知何故没有到达目的地。
I'm out of ideas why docker0
and bridge1
behave differently. 我不明白为什么
docker0
和bridge1
表现不同。 Any suggestion? 有什么建议吗?
Update 1 更新1
Here's the tcpdump
output on the target container when pinged from another. 这是从另一个容器中ping到目标容器时的
tcpdump
输出。
$ tcpdump -i eth1
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth1, link-type EN10MB (Ethernet), capture size 65535 bytes
22:11:17.754261 IP 192.168.1.65 > 172.17.99.1: ICMP echo request, id 26443, seq 1, length 6
Note the source IP is 192.168.1.65
, which is the eth0
of the host, so there seems to be some SNAT going on on the bridge. 注意源IP是
192.168.1.65
,这是主机的eth0
,所以桥上似乎有一些SNAT正在进行。
Finally, printing out the nat
IP table revealed the cause of the problem: 最后,打印出
nat
IP表揭示了问题的原因:
$ sudo iptables -L -t nat
...
Chain POSTROUTING (policy ACCEPT)
target prot opt source destination
MASQUERADE all -- 172.17.0.0/16 anywhere
...
Because my container's eth0
's IP is on 172.17.0.0/16
, packets sent have their source IP changed. 因为我的容器的
eth0
的IP是在172.17.0.0/16
,所以发送的数据包的源IP已更改。 This is why the responses from ping
can't go back to the source. 这就是
ping
的响应无法返回源的原因。
Conclusion 结论
The solution is to change the container's eth0
's IP to a different network than that of the default docker0
. 解决方案是将容器的
eth0
的IP更改为与默认docker0
不同的网络。
Copied from
Update 1
in question从有问题的
Update 1
复制
Here's the tcpdump
output on the target container when pinged from another. 这是从另一个容器中ping到目标容器时的
tcpdump
输出。
$ tcpdump -i eth1
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth1, link-type EN10MB (Ethernet), capture size 65535 bytes
22:11:17.754261 IP 192.168.1.65 > 172.17.99.1: ICMP echo request, id 26443, seq 1, length 6
Note the source IP is 192.168.1.65
, which is the eth0
of the host, so there seems to be some SNAT going on on the bridge. 注意源IP是
192.168.1.65
,这是主机的eth0
,所以桥上似乎有一些SNAT正在进行。
Finally, printing out the nat
IP table revealed the cause of the problem: 最后,打印出
nat
IP表揭示了问题的原因:
$ sudo iptables -L -t nat
...
Chain POSTROUTING (policy ACCEPT)
target prot opt source destination
MASQUERADE all -- 172.17.0.0/16 anywhere
...
Because my container's eth0
's IP is on 172.17.0.0/16
, packets sent have their source IP changed. 因为我的容器的
eth0
的IP是在172.17.0.0/16
,所以发送的数据包的源IP已更改。 This is why the responses from ping
can't go back to the source. 这就是
ping
的响应无法返回源的原因。
Conclusion 结论
The solution is to change the container's eth0
's IP to a different network than that of the default docker0
. 解决方案是将容器的
eth0
的IP更改为与默认docker0
不同的网络。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.