简体   繁体   English

Galera 节点无法连接到集群

[英]Galera node cant connect to cluster

Hello am using Galera with 10.1.12-MariaDB and SST method is xtrabackup-v2你好,我使用 Galera 和 10.1.12-MariaDB,SST 方法是 xtrabackup-v2

please dont recommend SST=rsync it's not work for me请不要推荐 SST=rsync 它对我不起作用

I have healthy cluster 8 nodes, sometimes one or few nodes goes down.我有健康的集群 8 个节点,有时一个或几个节点会出现故障。 im just service mysql start on it and they successfully connecting to cluster and all is OK.我只是在其上启动service mysql ,他们成功连接到集群,一切正常。

BUT sometimes, when disconnected nodes down few days i cant connect they to cluster.但有时,当几天断开连接的节点时,我无法将它们连接到集群。

after few tries im rm -fr /var/lib/mysql/* & rm -fr /var/log/mysql/* and nothing too, they have this message in syslog:经过几次尝试 im rm -fr /var/lib/mysql/* & rm -fr /var/log/mysql/*并且什么也没有,他们在系统日志中有这个消息:

mysqld: [ERROR] Binlog file '/var/log/mysql/mariadb-bin.003079' not found in binlog index, needed for recovery. Aborting.

i know how work with this, i can recover cluster when i have nodes which can't connect to cluster with message above, so i do this:我知道如何使用它,当我的节点无法通过上面的消息连接到集群时,我可以恢复集群,所以我这样做:

  1. shutdown all nodes, and leave only one node关闭所有节点,只留下一个节点
  2. shutdown last node and rm -fr /var/log/mysql/*关闭最后一个节点和rm -fr /var/log/mysql/*
  3. bootstrap this last node with deleted binlog使用已删除的 binlog 引导最后一个节点
  4. connect other nodes to cluster service mysql start将其他节点连接到集群service mysql start
  5. profit - all is OK利润 - 一切正常

But problem is:但问题是:

I cant down all production nodes, and down last node too, because i have 8 nodes to serve big site traffic and one running node immediately down when all traffic goes to it (of course because overload)我不能关闭所有生产节点,也不能关闭最后一个节点,因为我有 8 个节点来服务大站点流量,当所有流量都流向它时,一个正在运行的节点立即关闭(当然是因为过载)

QUESTION IS:问题是:

Please help me.请帮我。 How connect nodes to cluster when they won't connect and have error mysqld: [ERROR] Binlog file '/var/log/mysql/mariadb-bin.003079' not found in binlog index, needed for recovery. Aborting.当节点无法连接并出现错误mysqld: [ERROR] Binlog file '/var/log/mysql/mariadb-bin.003079' not found in binlog index, needed for recovery. Aborting.时如何将节点连接到集群mysqld: [ERROR] Binlog file '/var/log/mysql/mariadb-bin.003079' not found in binlog index, needed for recovery. Aborting. mysqld: [ERROR] Binlog file '/var/log/mysql/mariadb-bin.003079' not found in binlog index, needed for recovery. Aborting.

How big is the gcache ? gcache That controls whether IST can be used for re-attaching a node or not.这控制着 IST 是否可用于重新连接节点。

What is the value of expire_log_days ? expire_log_days的值是expire_log_days Is it so small that the binlog was lost before you tried to connect?是不是小到在你尝试连接之前binlog就丢失了? If you lost one, and need another for SST, you still have 6 to serve the 'big site'.如果您丢失了一个,并且需要另一个用于 SST,那么您仍然有 6 个可以为“大站点”提供服务。 It sounds like you need to increase the deployment to maybe 10 nodes in order to handle the site even when nodes wink out.听起来您需要将部署增加到 10 个节点,以便即使在节点消失时也能处理站点。

It sounds like you are stuck with SST.听起来你被 SST 困住了。

Take a look at the slowlog to see if some queries are taking so long that they are, indirectly, forcing you to have so many machines.查看慢日志,看看是否有些查询花费的时间太长,从而间接地迫使您拥有如此多的机器。 Fixing a couple of queries is a lot 'cheaper' than adding extra machines.修复几个查询比添加额外的机器要“便宜”得多。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM