How to debug "WSREP: SST failed: 1 (Operation not permitted)" with a MariaDB Galera cluster in Docker?

Question

Requirement: CentOS-based Docker container providing a MariaDB 10.x Galera cluster

Host Environment: OX X El Capitan 10.11.6, Docker 1.12.5 (14777)

Docker Container OS: CentOS Linux release 7.3.1611 (Core)

DB: 10.1.20-MariaDB

I founda promising Docker image , but the documentation seems to be obsolete, the commands to start the cluster do not work. At the time of writing the image uses wsrep_sst_method = rsync and so I figured that the following commands should work (replace /Users/Me/somedb with an empty directory on your host):

docker pull dayreiner/centos7-mariadb-10.1-galera

docker run -d --name db1 -h db1host -p 3306:3306 -e CLUSTER_NAME=joe -e CLUSTER=BOOTSTRAP -e MYSQL_ROOT_PASSWORD='pwd' -v /Users/Me/somedb:/var/lib/mysql dayreiner/centos7-mariadb-10.1-galera:latest

docker run -d --name db2 -h db2host -p 3307:3306 --link db1 -e CLUSTER_NAME=joe -e CLUSTER=db1host,db2host -e MYSQL_ROOT_PASSWORD='pwd' -v /Users/Me/somedb:/var/lib/mysql dayreiner/centos7-mariadb-10.1-galera:latest

The first container (db1) comes up and seems OK. But the last line that tries to add db2 as a second node to the Galera cluster results in the following error (docker logs db2):

2017-01-10 15:26:10 139742710823680 [Note] WSREP: New cluster view: global state: :-1, view# 0: Primary, number of nodes: 1, my index: 0, protocol version 3
2017-01-10 15:26:10 139742711142656 [ERROR] WSREP: SST failed: 1 (Operation not permitted)
2017-01-10 15:26:10 139742711142656 [ERROR] Aborting

I could not figure out what is wrong here and would appreciate ideas on how to analyze this further. Is this a problem of rsync, Galera or even Docker?

Answer 1

That's my image on dockerhub.

I had not tested the cluster (until now) on a single host, only running on multiple hosts. You're right though, running two on a single host seems to abort the second node on start.

This looks to be caused by the default bridge network not behaving nicely. Possibly some issue with handling the ports for state transfer. Not really sure why.

If you modify your commands to first create a custom network for your clustered containers to use on the backend, and then run the cluster members using that network, that seems to work when running two nodes on a single host:

# docker network create mariadb

# docker run -d --network=mariadb -p 3307:3306 --name db1 -e CLUSTER_NAME=test -e CLUSTER=BOOTSTRAP -e MYSQL_ROOT_PASSWORD=test -v /opt/test/db1:/var/lib/mysql dayreiner/centos7-mariadb-10.1-galera:latest

# docker run -d --network=mariadb -p 3308:3306 --name db2 -e CLUSTER_NAME=test -e CLUSTER=db1,db2 -e MYSQL_ROOT_PASSWORD=test -v /opt/test/db2:/var/lib/mysql dayreiner/centos7-mariadb-10.1-galera:latest

No errors this time on the second node:

# docker logs db2 -f
...snip
2017-01-12 20:33:08 139726185019648 [Note] WSREP: Signalling provider to continue.
2017-01-12 20:33:08 139726185019648 [Note] WSREP: SST received: 42eaa277-d906-11e6-b98a-3e6b9531c1b7:0
2017-01-12 20:33:08 139725604124416 [Note] WSREP: 1.0 (f170852fe1b6): State transfer from 0.0 (951fdda2454b) complete.
2017-01-12 20:33:08 139725604124416 [Note] WSREP: Shifting JOINER -> JOINED (TO: 0)
2017-01-12 20:33:08 139725604124416 [Note] WSREP: Member 1.0 (f170852fe1b6) synced with group.
2017-01-12 20:33:08 139725604124416 [Note] WSREP: Shifting JOINED -> SYNCED (TO: 0)
2017-01-12 20:33:08 139726105180928 [Note] WSREP: Synchronized with group, ready for connections
2017-01-12 20:33:08 139726105180928 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2017-01-12 20:33:08 139726185019648 [Note] mysqld: ready for connections.
Version: '10.1.20-MariaDB'  socket: '/var/lib/mysql/mysql.sock'  port: 3306  MariaDB Server

Try that, see how it goes. Also, if you run it using docker-compose it will also work without any problems. This is likely because compose creates a dedicated compose container network by default. You can see an example compose file in this gist .

Just make sure to use a different directory for each mariadb instance, and after you have your cluster started, stop db1 and relaunch it as a regular cluster member (otherwise the next time db1 is started it will keep bootstrapping a new cluster).

Answer 2

Works after upgrading the Docker image to MariaDB 10.2.3 (from 10.1.20).

I am not 100% sure whether I have a truly valid cluster now, but at least show status like "wsrep_cluster_size"; produces the following output and the DB is usable:

+--------------------+-------+
| Variable_name      | Value |
+--------------------+-------+
| wsrep_cluster_size | 3     |
+--------------------+-------+

Note: I also omitted the -v option and placed the DB files inside the Docker container instead of on an external volume. I don't think that this makes a difference regarding the cluster, but I did not verify 10.2.3 with -v. However, I tried 10.1.20 with both variations (external volume with -v and container-internal files) and both did not work.

How to debug "WSREP: SST failed: 1 (Operation not permitted)" with a MariaDB Galera cluster in Docker?

Question

2 answers

solution1
1 ACCPTED 2017-01-12 21:04:11

solution2
0 2017-01-11 16:22:27

How to debug "WSREP: SST failed: 1 (Operation not permitted)" with a MariaDB Galera cluster in Docker?

Question

2 answers

solution1 1 ACCPTED 2017-01-12 21:04:11

solution2 0 2017-01-11 16:22:27

solution1
1 ACCPTED 2017-01-12 21:04:11

solution2
0 2017-01-11 16:22:27