It seems my server ran out of space and I was having some problems with some of the deployed docker stacks. Took me a while to figure it out, but eventually I did and removed a couple of containers and images to free some space.
I was able to run service docker restart
and it worked. However, there are some problems:
docker info
says the swarm is "Pending" docker node ls
shows the only node I have (Leader), it is available
but it is down
journalctl -f -u docker
says `level=error msg="error removing task " error="incompatible value module=node/agent/worker node.id=" When running docker service ls
, all services have 0/1
replicas.
This is the status when running docker node ls
"Status": {
"State": "down",
"Message": "heartbeat failure for node in \"unknown\" state",
"Addr": "<ip and port>"
},
"ManagerStatus": {
"Leader": true,
"Reachability": "reachable",
"Addr": "<ip and port>"
}
How can I get my services running again?
Sometimes when you restart or update your docker version the tasks.db file gets corrupted.
This is an open issue ( #34827 ), some people have suggested a workaround to this issue moving the tasks.db file and testing if this fixes the issue then delete the tasks.db file. Docker automatically will create a new one for you.
You can find the tasks.db file in /var/lib/docker/swarm/worker/
I've faced the same issue recently and this workaround saved my day. I didn't lose any data related to my Stacks
Update October/19/2020
issue ( #34827 ) is closed but the solution still the same, remove the tasks.db file
Option 1:
Wait. Sometimes it fixes itself.
Option 2 (May vary depending on OS):
systemctl stop docker
rm -Rf /var/lib/docker/swarm
systemctl start docker
docker swarm init
i have found next solution https://forums.docker.com/t/docker-worker-nodes-shown-as-down-after-re-start/22329
Leader node after docker service was restarted was down.
I have fixed this by promoting worker node as manager node and then on the new manager node demote failed leader node.
ubuntu@staging1:~$ docker node ls
ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS
y0363og32cur9xq9yy0nqg6j9 * staging1 Down Active Reachable
x68yyqtt0rogmabec552634mf staging2 Ready Active
ubuntu@staging1:~$ docker node promote staging2
root@staging1:~# docker node ls
ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS
plxkuqqnkxotrzy7nhjj27w34 * staging1 Down Active Leader
x68yyqtt0rogmabec552634mf staging2 Ready Active Reachable
root@staging2:~# docker node demote staging1
root@staging2:~# docker node ls
ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS
plxkuqqnkxotrzy7nhjj27w34 staging1 Down Active
x68yyqtt0rogmabec552634mf * staging2 Ready Active Leader
root@staging2:~# docker node rm staging1
Get join-token from leader node:
root@staging2:~# docker swarm join-token manager
Reconnect failed node to docker swarm cluster:
root@staging1:~# docker swarm leave --force
root@staging1:~# systemctl stop docker
root@staging1:~# rm -rf /var/lib/docker/swarm/
root@staging1:~# systemctl start docker
root@staging1:~# docker swarm join --token XXXXXXXX 192.168.XX.XX:2377
root@staging1:~# docker node ls
ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS
y0363og32cur9xq9yy0nqg6j9 * staging1 Ready Active Reachable
x68yyqtt0rogmabec552634mf staging2 Ready Active Leader
root@staging1:~# docker node demote staging2
root@staging1:~# docker node ls
ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS
y0363og32cur9xq9yy0nqg6j9 * staging1 Ready Active Leader
x68yyqtt0rogmabec552634mf staging2 Ready Active
first check details of node: **
docker node ls
** if status of node is still showing down and availability is active then may be service running on node get stop. create service as global mode
OR update the global service running in swarm by following commands:
docker service update --force
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.