崩溃后无法将 mariadb/galera 节点加入集群

Question

One of our MariaDB/Galera clusters crashed last week.我们的一个 MariaDB/Galera 集群上周崩溃了。 We started a new cluster with the first node, joined the second node, but couldn't join a third node.我们用第一个节点启动了一个新集群，加入了第二个节点，但无法加入第三个节点。

We removed all files from data directory and the system started a SST job.我们从数据目录中删除了所有文件，系统启动了 SST 作业。 But it seems mysql is getting a 'uuid' cache from somewhere and after the transfer it couldn't start and join the cluster.但似乎 mysql 正在从某处获取“uuid”缓存，并且在传输后它无法启动并加入集群。 Logs:日志：

2021-07-31 19:01:51 0 [Note] WSREP: Found saved state: 00000000-0000-0000-0000-000000000000:-1, safe_to_bootstrap: 1
2021-07-31 19:01:51 0 [Note] WSREP: GCache history reset: 00000000-0000-0000-0000-000000000000:0 -> 00000000-0000-0000-0000-000000000000:-1
2021-07-31 19:01:52 2 [Note] WSREP: State transfer required: 
    **Group state: 6148b40a-ef57-11eb-92ab-77aa611985cb:581967649**
    Local state: 00000000-0000-0000-0000-000000000000:-
2021-07-31 19:01:52 2 [Note] WSREP: New cluster view: global state: 6148b40a-ef57-11eb-92ab-77aa611985cb:581967649, view# 5: Primary, number of nodes: 3, my index: 2, protocol version 3
2021-07-31 19:01:52 2 [Warning] WSREP: Gap in state sequence. Need state transfer.
2021-07-31 19:01:52 0 [Note] WSREP: Running: 'wsrep_sst_rsync --role 'joiner' --address '10.73.64.104' --datadir '/media/dados/mysql/'   --parent '28752'  ''  '''
2021-07-31 19:01:52 2 [Note] WSREP: Prepared SST request: rsync|10.73.64.104:4444/rsync_sst
2021-07-31 19:01:52 2 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2021-07-31 19:01:52 2 [Note] WSREP: REPL Protocols: 9 (4, 2)
2021-07-31 19:01:52 2 [Note] WSREP: Assign initial position for certification: 581967649, protocol version: 4

2021-07-31 19:01:52 2 [Warning] WSREP: Failed to prepare for incremental state transfer: Local state UUID (00000000-0000-0000-0000-000000000000) does not match group state UUID (6148b40a-ef57-11eb-92ab-77aa611985cb): 1 (Operation not permitted)
     at galera/src/replicator_str.cpp:prepare_for_IST():482. IST will be unavailable.


2021-07-31 19:01:52 0 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 581967650)
2021-07-31 19:01:52 2 [Note] WSREP: Requesting state transfer: success, donor: 0
2021-07-31 19:01:52 2 [Note] WSREP: GCache history reset: 00000000-0000-0000-0000-000000000000:0 -> 6148b40a-ef57-11eb-92ab-77aa611985cb:581967649


2021-07-31 19:55:01 0 [Note] WSREP: SST complete, seqno: 581967651

2021-07-31 19:55:04 0 [Note] WSREP: SST received: ba9d2e19-a7ed-11e8-ae5d-f7d6266c9160:581967651

2021-07-31 19:55:04 2 [ERROR] WSREP: Application received wrong state: 
    **Received: ba9d2e19-a7ed-11e8-ae5d-f7d6266c9160**
    Required: 6148b40a-ef57-11eb-92ab-77aa611985cb
2021-07-31 19:55:04 2 [ERROR] WSREP: Application state transfer failed. This is unrecoverable condition, restart required.

The cluster is running with uuid: 6148b40a-ef57-11eb-92ab-77aa611985cb but after SST this node is 'receiving' uuid ba9d2e19-a7ed-11e8-ae5d-f7d6266c9160集群使用 uuid 运行：6148b40a-ef57-11eb-92ab-77aa611985cb 但在 SST 之后，此节点正在“接收”uuid ba9d2e19-a7ed-11e8-ae5d-f7d6266c9160

Do you have any idea how to solve this issue ?你知道如何解决这个问题吗？

Thanks, Fernando谢谢，费尔南多

Answer 1

What is your wsrep_sst_donor value ?你的wsrep_sst_donor值是多少？ Have you started with empty datadir, particularly grastate.dat files ?您是否从空的 datadir 开始，特别是grastate.dat文件？ Have you tried increasing the systemd timeout of MariaDB process on that node?您是否尝试过增加该节点上 MariaDB 进程的 systemd 超时？

sudo tee /etc/systemd/system/mariadb.service.d/timeoutstartsec.conf <<EOF
[Service]
TimeoutStartSec=1200
EOF
sudo systemctl daemon-reload

崩溃后无法将 mariadb/galera 节点加入集群

问题描述

1 个解决方案

解决方案1
0 2021-10-25 21:06:54

崩溃后无法将 mariadb/galera 节点加入集群

问题描述

1 个解决方案

解决方案1 0 2021-10-25 21:06:54

解决方案1
0 2021-10-25 21:06:54