简体   繁体   English

只能在运行容器的节点上访问Docker群

[英]Docker swarm can be accessed only on nodes, where container is running

I'm currently running docker swarm on 3 nodes. 我目前在3个节点上运行docker swarm。 First I created network as 首先,我创建了网络

docker network create -d overlay xx_net

after that a service as 之后,作为

docker service create --network xxx_net --replicas 1 -p 12345:12345 --name nameofservice nameofimage:1

If I read correctly, this is routing mesh (=ok for me). 如果我没看错,这是路由网格(对我来说确定)。 But I can only access service on that node-ip, where container is running, even it should be available on every node ip's. 但是我只能在运行容器的那个节点ip上访问服务,即使它应该在每个节点ip上都可用。

If I drain some node, container starts up on different node and then it's on available on new ip. 如果我耗尽某个节点,容器将在其他节点上启动,然后在新IP上可用。


**more information added below here: **此处添加了更多信息:

  • I rebooted all servers - 3 workers, where on of them is manager 我重新启动了所有服务器-3个工人,其中一个是经理
  • after boot, all seems to work ok! 开机后,一切似乎正常!
  • I'm using rabbitmq-image from docker hub. 我正在使用来自docker hub的rabbitmq-image。 Dockerfile is quite small: FROM rabbitmq:3-management Container has been started at worker 2 Dockerfile很小: FROM rabbitmq:3-management容器已在工作程序2中启动
  • I can connect to rabbitmq's management page from all workers: worker1-ip:15672, worker2-ip:15672, worker3-ip:15672, so I think all ports needed is open. 我可以从所有工作程序连接到Rabbitmq的管理页面:worker1-ip:15672,worker2-ip:15672,worker3-ip:15672,所以我认为所需的所有端口均已打开。
  • about after 1 hour, rabbitmq-container has been moved from worker 2 to worker 3 - I do not know reason. 大约1小时后,rabbitmq容器已从工人2移至工人3-我不知道原因。
  • after that I cannot anymore connect from worker1-ip:15672, worker2-ip:15672 but from worker3-ip:15672 all still works! 之后,我无法再从worker1-ip:15672,worker2-ip:15672连接,但从worker3-ip:15672仍然可以正常工作!
  • I drained worker3 as docker node update --availability drain worker3 我在docker docker node update --availability drain worker3
  • container started at worker1. 容器从worker1开始。
  • after that I can only connect from worker1-ip:15672, not anymore from worker2 or worker3 之后,我只能从worker1-ip:15672连接,而不能再从worker2或worker3连接

One test more: all docker services restarted on all workers, and all works again?! 进一步测试:所有docker服务在所有worker上重新启动,然后所有工作又恢复正常? - let's wait a few hours... -我们等几个小时...

Today's status: 2 of 3 nodes are working ok. 今天的状态:3个节点中的2个工作正常。 On service log of manager: 在管理员的服务日志上:

Jul 12 07:53:32 dockerswarmmanager dockerd[7180]: time="2017-07-12T07:53:32.787953754Z" level=info msg="memberlist: Marking dockerswarmworker2-459b4229d652 as failed, suspect timeout reached"
Jul 12 07:53:39 dockerswarmmanager dockerd[7180]: time="2017-07-12T07:53:39.787783458Z" level=info msg="memberlist: Marking dockerswarmworker2-459b4229d652 as failed, suspect timeout reached"
Jul 12 07:55:27 dockerswarmmanager dockerd[7180]: time="2017-07-12T07:55:27.790564790Z" level=info msg="memberlist: Marking dockerswarmworker2-459b4229d652 as failed, suspect timeout reached"
Jul 12 07:55:41 dockerswarmmanager dockerd[7180]: time="2017-07-12T07:55:41.787974530Z" level=info msg="memberlist: Marking dockerswarmworker2-459b4229d652 as failed, suspect timeout reached"
Jul 12 07:56:33 dockerswarmmanager dockerd[7180]: time="2017-07-12T07:56:33.027525926Z" level=error msg="logs call failed" error="container not ready for logs: context canceled" module="node/agent/taskmanager" node.id=b6vnaouyci7b76ol1apq96zxx
Jul 12 07:56:33 dockerswarmmanager dockerd[7180]: time="2017-07-12T07:56:33.027668473Z" level=error msg="logs call failed" error="container not ready for logs: context canceled" module="node/agent/taskmanager" node.id=b6vnaouyci7b76ol1apq96zxx
Jul 12 08:13:22 dockerswarmmanager dockerd[7180]: time="2017-07-12T08:13:22.787796692Z" level=info msg="memberlist: Marking dockerswarmworker2-03ec8453a81f as failed, suspect timeout reached"
Jul 12 08:21:37 dockerswarmmanager dockerd[7180]: time="2017-07-12T08:21:37.788694522Z" level=info msg="memberlist: Marking dockerswarmworker2-03ec8453a81f as failed, suspect timeout reached"
Jul 12 08:24:01 dockerswarmmanager dockerd[7180]: time="2017-07-12T08:24:01.525570127Z" level=error msg="logs call failed" error="container not ready for logs: context canceled" module="node/agent/taskmanager" node.id=b6vnaouyci7b76ol1apq96zxx
Jul 12 08:24:01 dockerswarmmanager dockerd[7180]: time="2017-07-12T08:24:01.525713893Z" level=error msg="logs call failed" error="container not ready for logs: context canceled" module="node/agent/taskmanager" node.id=b6vnaouyci7b76ol1apq96zxx

and from worker's docker log: 和从工人的码头工人日志:

Jul 12 08:20:47 dockerswarmworker2 dockerd[677]: time="2017-07-12T08:20:47.486202716Z" level=error msg="Bulk sync to node h999-99-999-185.scenegroup.fi-891b24339f8a timed out"
Jul 12 08:21:38 dockerswarmworker2 dockerd[677]: time="2017-07-12T08:21:38.288117026Z" level=warning msg="memberlist: Refuting a dead message (from: h999-99-999-185.scenegroup.fi-891b24339f8a)"
Jul 12 08:21:39 dockerswarmworker2 dockerd[677]: time="2017-07-12T08:21:39.404554761Z" level=warning msg="Neighbor entry already present for IP 10.255.0.3, mac 02:42:0a:ff:00:03"
Jul 12 08:21:39 dockerswarmworker2 dockerd[677]: time="2017-07-12T08:21:39.404588738Z" level=warning msg="Neighbor entry already present for IP 104.198.180.163, mac 02:42:0a:ff:00:03"
Jul 12 08:21:39 dockerswarmworker2 dockerd[677]: time="2017-07-12T08:21:39.404609273Z" level=warning msg="Neighbor entry already present for IP 10.255.0.6, mac 02:42:0a:ff:00:06"
Jul 12 08:21:39 dockerswarmworker2 dockerd[677]: time="2017-07-12T08:21:39.404622776Z" level=warning msg="Neighbor entry already present for IP 104.198.180.163, mac 02:42:0a:ff:00:06"
Jul 12 08:21:47 dockerswarmworker2 dockerd[677]: time="2017-07-12T08:21:47.486007317Z" level=error msg="Bulk sync to node h999-99-999-185.scenegroup.fi-891b24339f8a timed out"
Jul 12 08:22:47 dockerswarmworker2 dockerd[677]: time="2017-07-12T08:22:47.485821037Z" level=error msg="Bulk sync to node h999-99-999-185.scenegroup.fi-891b24339f8a timed out"
Jul 12 08:23:17 dockerswarmworker2 dockerd[677]: time="2017-07-12T08:23:17.630602898Z" level=error msg="Bulk sync to node h999-99-999-185.scenegroup.fi-891b24339f8a timed out"

And this one from working worker: 而这个来自上班族的话:

Jul 12 08:33:09 h999-99-999-185.scenegroup.fi dockerd[10330]: time="2017-07-12T08:33:09.219973777Z" level=warning msg="Neighbor entry already present for IP 10.0.0.3, mac xxxxx"
Jul 12 08:33:09 h999-99-999-185.scenegroup.fi dockerd[10330]: time="2017-07-12T08:33:09.220539013Z" level=warning msg="Neighbor entry already present for IP "managers ip here", mac xxxxxx"

I restarted docker on problematic worker and it started to work again. 我在有问题的工作器上重新启动了docker,它再次开始工作。 I'll be following... 我会追踪...

** Today's results: **今天的结果:

  • 2 of workers available, one is not 2个工人可用,一个不可用
  • I didn't a thing 我没事
  • after 4 hour "swarm alone", all seems to works again?! 经过四个小时的“孤单”之后,一切似乎又恢复正常了?!
  • services has been moved from worker to other for any good reason, all results seems to be problem with communication. 服务由于任何正当理由已从工作者转移到其他人,所有结果似乎都是沟通问题。
  • quite confusing. 相当混乱。

Upgrade to docker 17.06 升级到Docker 17.06

Ingress overlay networking was broken for a long time until about 17.06-rc3 入口覆盖网络被破坏了很长时间,直到大约17.06-rc3

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM