简体   繁体   中英

Mysql group replication: stuck on RECOVERING forever

I have two MySQL server which are running on same group replication. The setup had been done by below steps:

  1. The first server is production server with lot of data.
  2. I set it readonly and dump the data, then restore it on the fallback MySQL server
  3. After restored, I executed "START GROUP_REPLICATION" and it success join to the group. All data are sync well between two server.

But I found an other error: Every time I reload the fallback MySQL (by restart services) it will auto join to group but stuck at RECOVERING forever, I waited for 3 days but it still in RECOVERING. I checked on the log and don't see any error on both server, everything look good except the fallback is running as readonly and stay at RECOVERING.

What step did I missed?

My group configuration is (I followed the instruction from DigitalOcean help page at https://www.digitalocean.com/community/tutorials/how-to-configure-mysql-group-replication-on-ubuntu-16-04 ):

BINARY LOGGING # log_bin = /data/databases/mysql_bin199 expire_logs_days = 14

sync_binlog = 1 binlog_format = ROW

General replication settings gtid_mode = ON enforce_gtid_consistency = ON master_info_repository = TABLE relay_log_info_repository = TABLE binlog_checksum = NONE log_slave_updates = ON

log_bin = binlog

binlog_format = ROW transaction_write_set_extraction = XXHASH64 loose-group_replication_bootstrap_group = OFF

loose-group_replication_start_on_boot = ON loose-group_replication_ssl_mode = REQUIRED loose-group_replication_recovery_use_ssl = 1

Shared replication group configuration loose-group_replication_group_name =

"9dc4ae01-6664-437a-83f8-80546d58e025" loose-group_replication_ip_whitelist = "172.AAA.BBB.166,138.AAA.BBB.199" loose-group_replication_group_seeds = "172.AAA.BBB.166:33061,138.AAA.BBB.199:33061"

Single or Multi-primary mode? Uncomment these two lines

for multi-primary mode, where any host can accept writes loose-group_replication_single_primary_mode = OFF

loose-group_replication_enforce_update_everywhere_checks = ON

Host specific replication configuration server_id = 2 report_host = "138.AAA.BBB.199" loose-group_replication_local_address =

"138.AAA.BBB.199:33061"

Below is MySQL log on first server:

2018-06-08T06:10:12.167400Z 0 [Warning] Plugin group_replication

reported: 'Members removed from the group: 138.AAA.BBB.199:3306'

2018-06-08T06:10:12.167475Z 0 [Note] Plugin group_replication

reported: 'Group membership changed to 172.AAA.BBB.166:3306 on view

15271181169364149:11.' 2018-06-08T06:11:59.032666Z 0 [Note] Plugin

group_replication reported: 'Members joined the group:

138.AAA.BBB.199:3306' 2018-06-08T06:11:59.032722Z 0 [Note] Plugin group_replication reported: 'Group membership changed to

172.AAA.BBB.166:3306, 138.AAA.BBB.199:3306 on view 15271181169364149:12.'

Below is the MySQL log on fallback server:

2018-06-11T09:22:57.490896Z 0 [Warning] option 'max_allowed_packet':

unsigned value 3221225472 adjusted to 1073741824

2018-06-11T09:22:57.490942Z 0 [Warning] The use of InnoDB is mandatory

since MySQL 5.7. The former options like '--innodb=0/1/OFF/ON' or

'--skip-innodb' are ignored. 2018-06-11T09:22:57.491057Z 0 [Warning]

The syntax '--log_warnings/-W' is deprecated and will be removed in a

future release. Please use '--log_error_verbosity' instead.

2018-06-11T09:22:57.491098Z 0 [Warning] TIMESTAMP with implicit

DEFAULT value is deprecated. Please use

--explicit_defaults_for_timestamp server option (see documentation for more details). 2018-06-11T09:22:57.492972Z 0 [Note] /usr/sbin/mysqld

(mysqld 5.7.22-log) starting as process 31633 ...

2018-06-11T09:22:57.500063Z 0 [Warning] InnoDB: Using

innodb_locks_unsafe_for_binlog is DEPRECATED. This option may be

removed in future releases. Please use READ COMMITTED transaction

isolation level instead; Please refer to

http://dev.mysql.com/doc/refman/5.7/en/set-transaction.html

2018-06-11T09:22:57.500175Z 0 [Note] InnoDB: PUNCH HOLE support

available 2018-06-11T09:22:57.500191Z 0 [Note] InnoDB: Mutexes and

rw_locks use GCC atomic builtins 2018-06-11T09:22:57.500200Z 0 [Note]

InnoDB: Uses event mutexes 2018-06-11T09:22:57.500205Z 0 [Note]

InnoDB: GCC builtin __atomic_thread_fence() is used for memory barrier

2018-06-11T09:22:57.500209Z 0 [Note] InnoDB: Compressed tables use

zlib 1.2.3 2018-06-11T09:22:57.500213Z 0 [Note] InnoDB: Using Linux

native AIO 2018-06-11T09:22:57.500430Z 0 [Note] InnoDB: Number of

pools: 1 2018-06-11T09:22:57.500575Z 0 [Note] InnoDB: Using CPU crc32

instructions 2018-06-11T09:22:57.501015Z 0 [ERROR] InnoDB: Failed to

create check sector file, errno:13 Please confirm O_DIRECT is

supported and remove the file /data/check_sector_size if it exists.

2018-06-11T09:22:57.502305Z 0 [Note] InnoDB: Initializing buffer pool,

total size = 4G, instances = 8, chunk size = 128M

2018-06-11T09:22:57.799065Z 0 [Note] InnoDB: Completed initialization

of buffer pool 2018-06-11T09:22:57.857325Z 0 [Note] InnoDB: If the

mysqld execution user is authorized, page cleaner thread priority can

be changed. See the man page of setpriority().

2018-06-11T09:22:57.870317Z 0 [Note] InnoDB: Highest supported file

format is Barracuda. 2018-06-11T09:22:58.081570Z 0 [Note] InnoDB:

Creating shared tablespace for temporary tables

2018-06-11T09:22:58.081656Z 0 [Note] InnoDB: Setting file

'/data/databases/ibtmp1' size to 12 MB. Physically writing the file

full; Please wait ... 2018-06-11T09:22:58.116190Z 0 [Note] InnoDB:

File '/data/databases/ibtmp1' size is now 12 MB.

2018-06-11T09:22:58.117279Z 0 [Note] InnoDB: 96 redo rollback

segment(s) found. 96 redo rollback segment(s) are active.

2018-06-11T09:22:58.117293Z 0 [Note] InnoDB: 32 non-redo rollback

segment(s) are active. 2018-06-11T09:22:58.117670Z 0 [Note] InnoDB:

Waiting for purge to start 2018-06-11T09:22:58.168094Z 0 [Note]

InnoDB: 5.7.22 started; log sequence number 51745666191

2018-06-11T09:22:58.168309Z 0 [Note] InnoDB: Loading buffer pool(s)

from /data/databases/ib_buffer_pool 2018-06-11T09:22:58.168558Z 0

[Note] Plugin 'FEDERATED' is disabled. 2018-06-11T09:22:58.183268Z 0

[Warning] CA certificate /etc/mysql/mysql-ssl/ca-cert.pem is self

signed. 2018-06-11T09:22:58.184615Z 0 [Note] Server hostname

(bind-address): '138.AAA.BBB.199'; port: 3306

2018-06-11T09:22:58.184636Z 0 [Note] - '138.AAA.BBB.199' resolves to

'138.AAA.BBB.199'; 2018-06-11T09:22:58.184668Z 0 [Note] Server socket

created on IP: '138.AAA.BBB.199'. 2018-06-11T09:22:58.186203Z 0

[Warning] 'user' entry 'mysql.session@localhost' ignored in

--skip-name-resolve mode. 2018-06-11T09:22:58.186220Z 0 [Warning] 'user' entry 'mysql.sys@localhost' ignored in --skip-name-resolve

mode. 2018-06-11T09:22:58.186238Z 0 [Warning] 'user' entry

'phpmadsys@localhost' ignored in --skip-name-resolve mode.

2018-06-11T09:22:58.186260Z 0 [Warning] 'user' entry

'phpmyadmin@localhost' ignored in --skip-name-resolve mode.

2018-06-11T09:22:58.186308Z 0 [Warning] 'db' entry 'performance_schema

mysql.session@localhost' ignored in --skip-name-resolve mode.

2018-06-11T09:22:58.186313Z 0 [Warning] 'db' entry 'sys

mysql.sys@localhost' ignored in --skip-name-resolve mode.

2018-06-11T09:22:58.186318Z 0 [Warning] 'db' entry 'phpmyadmin

phpmadsys@localhost' ignored in --skip-name-resolve mode.

2018-06-11T09:22:58.186322Z 0 [Warning] 'db' entry 'performance_schema

datadog@localhost' ignored in --skip-name-resolve mode.

2018-06-11T09:22:58.186327Z 0 [Warning] 'db' entry 'phpmyadmin

phpmyadmin@localhost' ignored in --skip-name-resolve mode.

2018-06-11T09:22:58.186340Z 0 [Warning] 'proxies_priv' entry '@

root@localhost' ignored in --skip-name-resolve mode.

2018-06-11T09:22:58.188628Z 0 [Warning] 'tables_priv' entry 'user

mysql.session@localhost' ignored in --skip-name-resolve mode.

2018-06-11T09:22:58.188649Z 0 [Warning] 'tables_priv' entry

'sys_config mysql.sys@localhost' ignored in --skip-name-resolve mode.

2018-06-11T09:22:58.192624Z 0 [Warning] Neither --relay-log nor

--relay-log-index were used; so replication may break when this MySQL server acts as a slave and has his hostname changed!! Please use

'--relay-log=dvm02-relay-bin' to avoid this problem.

2018-06-11T09:22:58.206545Z 0 [Note] Event Scheduler: Loaded 0 events

2018-06-11T09:22:58.206745Z 0 [Note] /usr/sbin/mysqld: ready for

connections. Version: '5.7.22-log' socket:

'/var/run/mysqld/mysqld.sock' port: 3306 MySQL Community Server

(GPL) 2018-06-11T09:22:58.207175Z 2 [Note] Plugin group_replication

reported: 'Group communication SSL configuration:

group_replication_ssl_mode: "REQUIRED"; server_key_file:

"/etc/mysql/mysql-ssl/server-key.pem"; server_cert_file:

"/etc/mysql/mysql-ssl/server-cert.pem"; client_key_file:

"/etc/mysql/mysql-ssl/server-key.pem"; client_cert_file:

"/etc/mysql/mysql-ssl/server-cert.pem"; ca_file:

"/etc/mysql/mysql-ssl/ca-cert.pem"; ca_path: ""; cipher: "";

tls_version: "TLSv1,TLSv1.1"; crl_file: ""; crl_path: ""'

2018-06-11T09:22:58.207378Z 2 [Warning] Plugin group_replication

reported: '[GCS] Automatically adding IPv4 localhost address to the

whitelist. It is mandatory that it is added.'

2018-06-11T09:22:58.207820Z 2 [Note] Plugin group_replication

reported: 'Initialized group communication with configuration:

group_replication_group_name: "9dc4ae01-6664-437a-83f8-80546d58e025";

group_replication_local_address: "138.AAA.BBB.199:33061";

group_replication_group_seeds:

"172.AAA.BBB.166:33061,138.AAA.BBB.199:33061";

group_replication_bootstrap_group: false;

group_replication_poll_spin_loops: 0;

group_replication_compression_threshold: 1000000;

group_replication_ip_whitelist: "172.AAA.BBB.166,138.AAA.BBB.199"'

2018-06-11T09:22:58.207853Z 2 [Note] Plugin group_replication

reported: '[GCS] Configured number of attempts to join: 0'

2018-06-11T09:22:58.207859Z 2 [Note] Plugin group_replication

reported: '[GCS] Configured time between attempts to join: 5 seconds'

2018-06-11T09:22:58.207878Z 2 [Note] Plugin group_replication

reported: 'Member configuration: member_id: 2; member_uuid:

"822868f9-52a0-11e8-aa0e-1e45f9551f27"; single-primary mode: "false";

group_replication_auto_increment_increment: 7; '

2018-06-11T09:22:58.209024Z 3 [Note] 'CHANGE MASTER TO FOR CHANNEL

'group_replication_applier' executed'. Previous state

master_host='', master_port= 0, master_log_file='',

master_log_pos= 4, master_bind=''. New state master_host='',

master_port= 0, master_log_file='', master_log_pos= 4, master_bind=''.

2018-06-11T09:22:58.216904Z 6 [Note] Slave SQL thread for channel

'group_replication_applier' initialized, starting replication in log

'FIRST' at position 0, relay log

'./dvm02-relay-bin-group_replication_applier.000071' position: 4

2018-06-11T09:22:58.216931Z 2 [Note] Plugin group_replication

reported: 'Group Replication applier module successfully initialized!'

2018-06-11T09:22:58.241357Z 0 [Note] Plugin group_replication

reported: 'XCom protocol version: 3' 2018-06-11T09:22:58.241397Z 0

[Note] Plugin group_replication reported: 'XCom initialized and ready

to accept incoming connections on port 33061'

2018-06-11T09:22:59.213826Z 0 [Note] InnoDB: Buffer pool(s) load

completed at 180611 11:22:59 2018-06-11T09:23:00.316791Z 0 [Note]

Plugin group_replication reported: 'Group membership changed to

172.AAA.BBB.166:3306, 138.AAA.BBB.199:3306 on view 15271181169364149:16.'

sorry for the delay.

From your error log I can't see any issues on recovery being thrown but I either don't see any connection attempts. I wonder if you have some issues with data in the group replication relay logs...

I suggest you open a bug if the problem still persists. As a workaround you can try to reset the applier channel before "START GROUP_REPLICATION"

RESET SLAVE ALL FOR CHANNEL "group_replication_applier";

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM