简体   繁体   中英

Docker Node is Down after service restart

It seems my server ran out of space and I was having some problems with some of the deployed docker stacks. Took me a while to figure it out, but eventually I did and removed a couple of containers and images to free some space.

I was able to run service docker restart and it worked. However, there are some problems:

  • docker info says the swarm is "Pending"
  • docker node ls shows the only node I have (Leader), it is available but it is down
  • journalctl -f -u docker says `level=error msg="error removing task " error="incompatible value module=node/agent/worker node.id="

When running docker service ls , all services have 0/1 replicas.

This is the status when running docker node ls

"Status": {
    "State": "down",
    "Message": "heartbeat failure for node in \"unknown\" state",
    "Addr": "<ip and port>"
},
"ManagerStatus": {
    "Leader": true,
    "Reachability": "reachable",
    "Addr": "<ip and port>"
}

How can I get my services running again?

Sometimes when you restart or update your docker version the tasks.db file gets corrupted.

This is an open issue ( #34827 ), some people have suggested a workaround to this issue moving the tasks.db file and testing if this fixes the issue then delete the tasks.db file. Docker automatically will create a new one for you.

You can find the tasks.db file in /var/lib/docker/swarm/worker/

I've faced the same issue recently and this workaround saved my day. I didn't lose any data related to my Stacks

Update October/19/2020

issue ( #34827 ) is closed but the solution still the same, remove the tasks.db file

Option 1:

Wait. Sometimes it fixes itself.

Option 2 (May vary depending on OS):

systemctl stop docker
rm -Rf /var/lib/docker/swarm
systemctl start docker
docker swarm init

i have found next solution https://forums.docker.com/t/docker-worker-nodes-shown-as-down-after-re-start/22329

Leader node after docker service was restarted was down.

I have fixed this by promoting worker node as manager node and then on the new manager node demote failed leader node.

ubuntu@staging1:~$ docker node ls
ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS
y0363og32cur9xq9yy0nqg6j9 * staging1 Down Active Reachable
x68yyqtt0rogmabec552634mf staging2 Ready Active

ubuntu@staging1:~$ docker node promote staging2

root@staging1:~# docker node ls
ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS
plxkuqqnkxotrzy7nhjj27w34 * staging1 Down Active Leader
x68yyqtt0rogmabec552634mf staging2 Ready Active Reachable

root@staging2:~# docker node demote staging1

root@staging2:~# docker node ls
ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS
plxkuqqnkxotrzy7nhjj27w34 staging1 Down Active
x68yyqtt0rogmabec552634mf * staging2 Ready Active Leader

root@staging2:~# docker node rm staging1

Get join-token from leader node:
root@staging2:~# docker swarm join-token manager

Reconnect failed node to docker swarm cluster:

root@staging1:~# docker swarm leave --force
root@staging1:~# systemctl stop docker
root@staging1:~# rm -rf /var/lib/docker/swarm/
root@staging1:~# systemctl start docker
root@staging1:~# docker swarm join --token XXXXXXXX 192.168.XX.XX:2377

root@staging1:~# docker node ls
ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS
y0363og32cur9xq9yy0nqg6j9 * staging1 Ready Active Reachable
x68yyqtt0rogmabec552634mf staging2 Ready Active Leader

root@staging1:~# docker node demote staging2

root@staging1:~# docker node ls
ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS
y0363og32cur9xq9yy0nqg6j9 * staging1 Ready Active Leader
x68yyqtt0rogmabec552634mf staging2 Ready Active

first check details of node: **

docker node ls

** if status of node is still showing down and availability is active then may be service running on node get stop. create service as global mode
OR update the global service running in swarm by following commands:

docker service update --force

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM