简体   繁体   English

将节点加入 docker swarm 失败

[英]Fail join node to docker swarm

I have two servers in Docker Swarm, but when I need to add a third server - I get the result:我在 Docker Swarm 中有两台服务器,但是当我需要添加第三台服务器时 - 我得到了结果:

Error response from daemon: rpc error: code = 14 desc = grpc: the connection is unavailable来自守护进程的错误响应:rpc 错误:代码 = 14 desc = grpc:连接不可用

All servers in one network.一个网络中的所有服务器。

What could be the problem?可能是什么问题呢?

I'd say it's possibly firewall related.我会说这可能与防火墙有关。 Ensure your ports are configured correctly on the third box.确保在第三个盒子上正确配置了您的端口。 From the Docker docs :来自Docker 文档

Open protocols and ports between the hosts The following ports must be available.主机之间的开放协议和端口 以下端口必须可用。 On some systems, these ports are open by default.在某些系统上,这些端口默认是打开的。

TCP port 2377 for cluster management communications TCP and UDP port 7946 for communication among nodes UDP port 4789 for overlay network traffic TCP 端口 2377 用于集群管理通信 TCP 和 UDP 端口 7946 用于节点间通信 UDP 端口 4789 用于覆盖网络流量

From official Docker swarm tutorial来自官方Docker swarm 教程

The following ports must be open on your docker hosts.以下端口必须在您的 docker 主机上打开。

TCP port 2377 for cluster management communications
TCP and UDP port 7946 for communication among nodes   
UDP port 4789 for overlay network traffic

To enable this ports run the below command on all your docker hosts.要启用此端口,请在所有 docker 主机上运行以下命令。 kindly follow the digitalocen article for complete steps.请按照digitalocen 文章了解完整步骤。

firewall-cmd --add-port=2376/tcp --permanent
firewall-cmd --add-port=2377/tcp --permanent
firewall-cmd --add-port=7946/tcp --permanent
firewall-cmd --add-port=7946/udp --permanent
firewall-cmd --add-port=4789/udp --permanent

As others have pointed out, closed ports could be one reason.正如其他人指出的那样,关闭端口可能是原因之一。 But I've also found a couple of more.但我还发现了更多。

Recent version of Docker is suffering from massive proxy issues:最新版本的 Docker 存在大量代理问题:

According to this comment , the fix is "likely" to make it into Docker version 17.11 and it is "considered" to be put in a patch release for 17.09 .根据此评论,该修复“可能”使其成为 Docker 版本17.11并且“考虑”将其放入17.09的补丁版本中。

All my ports are open and the NO_PROXY hack described in the aforementioned links did not work.我所有的端口都打开了,并且上述链接中描述的NO_PROXY hack 不起作用。

I tried all Docker versions between 17.05 all the way to 17.11.0-ce-rc3, build 5b4af4f with no success which led me to suspect the culprit might be a recent upgrade of Vagrant (I am using 2.0.1 ) and/or VirtualBox (using 5.1.30 ).我尝试了17.0517.11.0-ce-rc3, build 5b4af4f之间的所有 Docker 版本17.11.0-ce-rc3, build 5b4af4f没有成功,这让我怀疑罪魁祸首可能是最近升级的 Vagrant(我使用的是2.0.1 )和/或 VirtualBox (使用5.1.30 )。 Upgrading either one of these two usually leads to all kinds of random problems.升级这两者之一通常会导致各种随机问题。 But, instead of downgrading these guys I tried to upgrade the Vagrant boxes I run.但是,我没有降级这些家伙,而是尝试升级我运行的 Vagrant 机器。

In my two-machine setup, I switched the first node's box to fso/artful64-desktop and the second node's box to fso/artful64 (both version 2017-11-01 ).在我的双机设置中,我将第一个节点的盒子切换到fso/artful64-desktop ,将第二个节点的盒子切换到fso/artful64 (均为2017-11-01版本)。 To my surprise, this made Docker Swarm work on version 17.10.0-ce and 17.11.0-ce-rc3, build 5b4af4f .令我惊讶的是,这使 Docker Swarm 在17.10.0-ce17.11.0-ce-rc3, build 5b4af4f版本上工作17.11.0-ce-rc3, build 5b4af4f Please note that private networking is broken on Vagrant 2.0.1 if you want to use Ubuntu 17.10 boxes lol (can be manually fixed ).请注意,如果您想使用 Ubuntu 17.10 机器(可以手动修复),Vagrant 2.0.1上的私有网络会损坏。

The error message we were facing was not exactly the same but quite similar:我们面临的错误消息并不完全相同,但非常相似:

Error response from daemon: rpc error: code = Unavailable desc = grpc: the connection is unavailable来自守护进程的错误响应:rpc 错误:代码 = 不可用 desc = grpc:连接不可用

In our case we added proxy settings to the docker daemon in order to reach docker hub images from behind our corporate proxy.在我们的例子中,我们向 docker 守护进程添加了代理设置,以便从我们的公司代理后面访问 docker hub 图像。 So when trying to docker swarm join a worker to the manager it went to the proxy instead.因此,当尝试 docker swarm 将工作人员加入经理时,它转而转到代理。

Solution: Add the swarm manager to the docker daemon NO_PROXY environment variable and you are good to go.解决方案:将 swarm 管理器添加到 docker 守护进程 NO_PROXY 环境变量中,您就可以开始了。 This answer tells you how.这个答案告诉你如何。

More info about it is available in Docker Forum有关它的更多信息可在 Docker 论坛中找到

https://forums.docker.com/t/error-response-from-daemon-rpc-error-code-unavailable-desc-grpc-the-connection-is-unavailable/39066 https://forums.docker.com/t/error-response-from-daemon-rpc-error-code-unavailable-desc-grpc-the-connection-is-unavailable/39066

As other people mentioned, adding an additional port to firewalld resolve the issue正如其他人所提到的,向 firewalld 添加一个额外的端口可以解决这个问题

sudo firewall-cmd --add-port=2376/tcp --permanent  
sudo firewall-cmd --add-port=2377/tcp --permanent  
sudo firewall-cmd --add-port=7946/tcp --permanent  
sudo firewall-cmd --add-port=7946/udp --permanent  
sudo firewall-cmd --add-port=4789/udp --permanent

Remember to restart firewall after open the ports打开端口后记得重启防火墙

sudo firewall-cmd --add-port=2376/tcp --permanent 
sudo firewall-cmd --add-port=2377/tcp --permanent 
sudo firewall-cmd --add-port=7946/tcp --permanent 
sudo firewall-cmd --add-port=7946/udp --permanent 
sudo firewall-cmd --add-port=4789/udp --permanent

sudo systemctl restart firewalld

easier one from official docs :来自官方文档的更简单的一个:

  1. re-init the swarm manager:重新初始化群管理器:

    • take down the swarm with docker swarm leave --forcedocker swarm leave --force
    • re-init with docker swarm init --advertise-addr [ip of the machine, check it with 'docker-machine ls']:2377 ( 2377 is the port for swarm joins )使用docker swarm init --advertise-addr [ip of the machine, check it with 'docker-machine ls']:2377重新初始化( 2377swarm 加入的端口
  2. then add your the machine to the swarm with docker-machine ssh myvm2 "docker swarm join \\ --token <token> \\ <ip>:<port>"然后使用docker-machine ssh myvm2 "docker swarm join \\ --token <token> \\ <ip>:<port>"将您的机器添加到docker-machine ssh myvm2 "docker swarm join \\ --token <token> \\ <ip>:<port>"

Temporary solved by flushing iptables , but was a bad idea!!通过刷新iptables临时解决了,但这是一个坏主意!! After that, cloning images didn't work because it didn't find the appropriate iptables chain "docker".之后,克隆图像不起作用,因为它没有找到合适的 iptables 链“docker”。

It is indeed a FW issue, but more precisely firewalld (centos7).这确实是一个固件问题,但更准确地说是firewalld (centos7)。
Solved the issue by allowing the appropriate ports through firewalld, as mentioned by :通过允许适当的端口通过 firewalld 解决了该问题,如所述:
@sanjaykumar81 answer. @sanjaykumar81 回答。

Ensure that the firewalld in systemd machines is allowing the ports mentioned in the docker docs :确保 systemd 机器中的 firewalld 允许 docker docs 中提到的端口:

The following ports must be available.以下端口必须可用。 On some systems, these ports are open by default.在某些系统上,这些端口默认是打开的。

TCP port 2377 for cluster management communications TCP and UDP port 7946 for communication among nodes UDP port 4789 for overlay network traffic TCP 端口 2377 用于集群管理通信 TCP 和 UDP 端口 7946 用于节点间通信 UDP 端口 4789 用于覆盖网络流量

Ensure that the appropriate TCP / UDP ports are enabled确保启用了适当的 TCP/UDP 端口

error: desc = "transport: x509: certificate has expired or is not yet valid"错误:desc =“传输:x509:证书已过期或尚未有效”

at certain times due to time not in sync between the leader and the worker node , this error could be seen.在某些时候,由于领导者和工作节点之间的时间不同步,可以看到此错误。 Using chronyd / ntpd this can be resolved.使用 chronyd / ntpd 可以解决这个问题。

enter link description here在此处输入链接描述

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM