简体   繁体   English

Percona Xtradb群集崩溃

[英]Percona Xtradb cluster crashing

We have a Percona Xtradb cluster with 5 nodes and an arbitrator. 我们有一个Percona Xtradb群集,其中包含5个节点和一个仲裁器。 One of our Php developers ran a bad query on the cluster, crashing all the nodes. 我们的一名Php开发人员在集群上运行了错误的查询,导致所有节点崩溃。 After the crash, we could not collect any error log to tell us what really went wrong as the entire cluster crashed without performing any logging. 崩溃之后,我们无法收集任何错误日志来告诉我们真正的问题出在整个集群崩溃而没有执行任何日志记录的情况下。

I have always thought that when a single query is executed on the cluster, it is processed by only one of the nodes in the cluster. 我一直认为,在集群上执行单个查询时,它仅由集群中的一个节点处理。 So if the query is bad (to the point of killing a db server), it should only crash the one node thats processing it, leaving the cluster running with the remaining 4 nodes. 因此,如果查询不好(以致杀死一台数据库服务器为目的),则它仅应使正在处理该节点的一个节点崩溃,而使群集与其余4个节点一起运行。

This behavior has puzzled us and we would like to understand what is really going on especially that this is the second time this is happening. 这种行为使我们感到困惑,我们想了解到底发生了什么,特别是这是第二次发生这种情况。 Why would a query running on the cluster while processed by one of the nodes would cause other nodes in the cluster to crash in case of some issue while being processed? 为什么在集群中运行的查询在被一个节点处理时会导致集群中的其他节点在处理时出现某些问题而崩溃?

Below is our my.cnf config: 以下是我们的my.cnf配置:

#
# Default values.
[mysqld_safe]
flush_caches
numa_interleave
#
#
[mysqld]
back_log = 65535
binlog_format = ROW
character_set_server = utf8
collation_server = utf8_general_ci
datadir = /var/lib/mysql
default_storage_engine = InnoDB
expand_fast_index_creation = 1
expire_logs_days = 7
innodb_autoinc_lock_mode = 2
innodb_buffer_pool_instances = 16
innodb_buffer_pool_populate = 1
innodb_buffer_pool_size = 32G   # XXX 64GB RAM, 80%
innodb_data_file_path = ibdata1:64M;ibdata2:64M:autoextend
innodb_file_format = Barracuda
innodb_file_per_table
innodb_flush_log_at_trx_commit = 2
innodb_flush_method = O_DIRECT
innodb_io_capacity = 1600
innodb_large_prefix
innodb_locks_unsafe_for_binlog = 1
innodb_log_file_size = 64M
innodb_print_all_deadlocks = 1
innodb_read_io_threads = 64
innodb_stats_on_metadata = FALSE
innodb_support_xa = FALSE
innodb_write_io_threads = 64
log-bin = mysqld-bin
log-queries-not-using-indexes
log-slave-updates
long_query_time = 1
max_allowed_packet = 64M
max_connect_errors = 4294967295
max_connections = 4096
min_examined_row_limit = 1000
port = 3306
relay-log-recovery = TRUE
skip-name-resolve
slow_query_log = 1
slow_query_log_timestamp_always = 1
table_open_cache = 4096
thread_cache = 1024
tmpdir = /db/tmp
transaction_isolation = REPEATABLE-READ
updatable_views_with_limit = 0
user = mysql
wait_timeout = 60
#
# Galera Variable config 
wsrep_cluster_address = gcomm://ip_1, ip_2, ip_3,ip_4,ip_4,ip_5
wsrep_cluster_name = cluster_db
wsrep_provider = /usr/lib/libgalera_smm.so
wsrep_provider_options = "gcache.size=4G"
wsrep_slave_threads = 32
wsrep_sst_auth = "user:password"
wsrep_sst_donor = "db1"
#wsrep_sst_method = xtrabackup_throttle
wsrep_sst_method = xtrabackup-v2
#
# XXX You *MUST* change!
server-id = 1

Can you post the query? 您可以发布查询吗? SELECT queries only execute on a single node but all write queries will execute everywhere. SELECT查询仅在单个节点上执行,但是所有写查询将在任何地方执行。 What's in your error log? 您的错误日志中有什么?

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM