简体   繁体   English

Docker Swarm 和 Docker 服务

[英]Docker Swarm and Docker Service

Swarm Gurus,蜂群大师,

I have just setup my very first Docker Swarm environment with 3 hosts.我刚刚设置了我的第一个 Docker Swarm 环境,有 3 个主机。 By following the manuals here:按照此处的手册进行操作:

https://docs.docker.com/engine/install/ubuntu/
https://docs.docker.com/engine/swarm/swarm-tutorial/
https://docs.docker.com/engine/swarm/swarm-tutorial/create-swarm/
https://docs.docker.com/engine/swarm/swarm-tutorial/deploy-service/
https://docs.docker.com/engine/swarm/swarm-tutorial/scale-service/

I was able to set it up and was able to create a service with 5 replicas and working as expected.我能够设置它并能够创建一个具有 5 个副本的服务并按预期工作。 The containers were spread across 3 Nodes (Manager and 2 Worker Nodes).容器分布在 3 个节点(管理器和 2 个工作器节点)上。

Then I started to experiment by shutting down all the 3 Nodes and starting them up.然后我开始通过关闭所有 3 个节点并启动它们来进行实验。 The service I have created (named helloworld) was automatically spawned up by docker and was restored as a swarm.我创建的服务(名为 helloworld)是由 docker 自动生成的,并恢复为一个 swarm。

But I noticed one thing, the original containers were no longer there but instead I got this:但我注意到一件事,原来的容器不再存在,而是我得到了这个:

someuser@manager:~$ docker service ps helloworld --no-trunc
ID                          NAME               IMAGE                                                                                   NODE      DESIRED STATE   CURRENT STATE            ERROR                                                         PORTS
8vlswsfq8ub5xn9vd401ilskn   helloworld.1       alpine:latest@sha256:21a3deaa0d32a8057914f36584b5288d2e5ecc984380bc0118285c70fa8c9300   manager   Running         Running 30 minutes ago
jqfgg41xppf7xcchnkvjyesyx    \_ helloworld.1   alpine:latest@sha256:21a3deaa0d32a8057914f36584b5288d2e5ecc984380bc0118285c70fa8c9300   manager   Shutdown        Failed 30 minutes ago    "No such container: helloworld.1.jqfgg41xppf7xcchnkvjyesyx"
wy382jy2yncpv6b3y1y0qfq3h   helloworld.2       alpine:latest@sha256:21a3deaa0d32a8057914f36584b5288d2e5ecc984380bc0118285c70fa8c9300   manager   Running         Running 30 minutes ago
mq7w469vck8hzr7p9w22f0rt1    \_ helloworld.2   alpine:latest@sha256:21a3deaa0d32a8057914f36584b5288d2e5ecc984380bc0118285c70fa8c9300   manager   Shutdown        Failed 30 minutes ago    "No such container: helloworld.2.mq7w469vck8hzr7p9w22f0rt1"
jp5wbvbdxxgh60vzef9iz73aj   helloworld.3       alpine:latest@sha256:21a3deaa0d32a8057914f36584b5288d2e5ecc984380bc0118285c70fa8c9300   worker01   Running         Running 30 minutes ago
t5wgad0dhu5hoyp3kjrdela4b    \_ helloworld.3   alpine:latest@sha256:21a3deaa0d32a8057914f36584b5288d2e5ecc984380bc0118285c70fa8c9300   worker01   Shutdown        Failed 30 minutes ago    "No such container: helloworld.3.t5wgad0dhu5hoyp3kjrdela4b"
km03jrxlvam162i8pt2ix6vlf   helloworld.4       alpine:latest@sha256:21a3deaa0d32a8057914f36584b5288d2e5ecc984380bc0118285c70fa8c9300   worker02   Running         Running 29 minutes ago
8hjnbjz4nmpqncmva4ubeqpx6    \_ helloworld.4   alpine:latest@sha256:21a3deaa0d32a8057914f36584b5288d2e5ecc984380bc0118285c70fa8c9300   worker02   Shutdown        Failed 30 minutes ago    "No such container: helloworld.4.8hjnbjz4nmpqncmva4ubeqpx6"
knbvl6el13l0poofdv1g6j11z   helloworld.5       alpine:latest@sha256:21a3deaa0d32a8057914f36584b5288d2e5ecc984380bc0118285c70fa8c9300   worker02   Running         Running 29 minutes ago
thlnyngdbwwsi30fuxx4wx7cd    \_ helloworld.5   alpine:latest@sha256:21a3deaa0d32a8057914f36584b5288d2e5ecc984380bc0118285c70fa8c9300   worker02   Shutdown        Failed 30 minutes ago    "No such container: helloworld.5.thlnyngdbwwsi30fuxx4wx7cd"

I am totally fine with the new containers, since I had not gracefully shutdown the nodes and not shutting them down is part of the test case.我对新容器完全没问题,因为我没有优雅地关闭节点并且不关闭它们是测试用例的一部分。

But I want to get rid of the nodes that have failed.但我想摆脱失败的节点。 Which are the following:以下是哪些:

jqfgg41xppf7xcchnkvjyesyx    \_ helloworld.1   alpine:latest@sha256:21a3deaa0d32a8057914f36584b5288d2e5ecc984380bc0118285c70fa8c9300   manager   Shutdown        Failed 30 minutes ago    "No such container: helloworld.1.jqfgg41xppf7xcchnkvjyesyx"
mq7w469vck8hzr7p9w22f0rt1    \_ helloworld.2   alpine:latest@sha256:21a3deaa0d32a8057914f36584b5288d2e5ecc984380bc0118285c70fa8c9300   manager   Shutdown        Failed 30 minutes ago    "No such container: helloworld.2.mq7w469vck8hzr7p9w22f0rt1"
t5wgad0dhu5hoyp3kjrdela4b    \_ helloworld.3   alpine:latest@sha256:21a3deaa0d32a8057914f36584b5288d2e5ecc984380bc0118285c70fa8c9300   worker01   Shutdown        Failed 30 minutes ago    "No such container: helloworld.3.t5wgad0dhu5hoyp3kjrdela4b"
8hjnbjz4nmpqncmva4ubeqpx6    \_ helloworld.4   alpine:latest@sha256:21a3deaa0d32a8057914f36584b5288d2e5ecc984380bc0118285c70fa8c9300   worker02   Shutdown        Failed 30 minutes ago    "No such container: helloworld.4.8hjnbjz4nmpqncmva4ubeqpx6"
thlnyngdbwwsi30fuxx4wx7cd    \_ helloworld.5   alpine:latest@sha256:21a3deaa0d32a8057914f36584b5288d2e5ecc984380bc0118285c70fa8c9300   worker02   Shutdown        Failed 30 minutes ago    "No such container: helloworld.5.thlnyngdbwwsi30fuxx4wx7cd"

I tried the following:我尝试了以下方法:

$ docker rm \_ helloworld.1
$ docker rm \helloworld.1.jqfgg41xppf7xcchnkvjyesyx
$ docker rm --link \_ helloworld.1
$ docker rm --link \helloworld.1.jqfgg41xppf7xcchnkvjyesyx

But all these didn't work.但所有这些都不起作用。

Your advice is much appreciated.非常感谢您的建议。

Thanks谢谢

docker ps lists all the tasks associated with a service, and tasks can be in a variety of states: started, running, complete etc. docker ps 列出了与服务关联的所有任务,任务可以处于多种状态:已启动、正在运行、已完成等。

Running tasks are associated with a container.运行任务与容器相关联。

The utility of tracking the tasks independently is that, from the docker service ps list, you can use the task id, rather than the service id in some docker commands, such as docker service logs <task id> in which case you can find out specifically why a particular task failed.独立跟踪任务的用处在于,从docker service ps列表中,您可以使用任务 id,而不是某些 docker 命令中的服务 id,例如docker service logs <task id>特别是特定任务失败的原因。

You can also docker inspect <task id> which will return a block of data indicating, perhaps why a task could not be started at all.您还可以docker inspect <task id> ,这将返回一个数据块,指示可能根本无法启动任务的原因。 But if it did start, the container id that actually ran the task, which you can use to go to the actual node, and examine for things like OOM errors or in-container logs.但如果它确实启动了,实际运行任务的容器 id,您可以将其用于 go 到实际节点,并检查 OOM 错误或容器内日志等内容。

You can clean up the containers associated with finished tasks, but docker automatically retains task history thats appropriate to the --max-update-retries number - setting this value smaller keeps the history smaller - but you still can;t (and really would not want to) clear it.您可以清理与已完成任务关联的容器,但 docker 会自动保留与 --max-update-retries 数字相适应的任务历史记录 - 将此值设置得更小会使历史记录更小 - 但你仍然可以;t(真的不会想要)清除它。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM