简体   繁体   中英

Percona Xtradb cluster crashing

We have a Percona Xtradb cluster with 5 nodes and an arbitrator. One of our Php developers ran a bad query on the cluster, crashing all the nodes. After the crash, we could not collect any error log to tell us what really went wrong as the entire cluster crashed without performing any logging.

I have always thought that when a single query is executed on the cluster, it is processed by only one of the nodes in the cluster. So if the query is bad (to the point of killing a db server), it should only crash the one node thats processing it, leaving the cluster running with the remaining 4 nodes.

This behavior has puzzled us and we would like to understand what is really going on especially that this is the second time this is happening. Why would a query running on the cluster while processed by one of the nodes would cause other nodes in the cluster to crash in case of some issue while being processed?

Below is our my.cnf config:

#
# Default values.
[mysqld_safe]
flush_caches
numa_interleave
#
#
[mysqld]
back_log = 65535
binlog_format = ROW
character_set_server = utf8
collation_server = utf8_general_ci
datadir = /var/lib/mysql
default_storage_engine = InnoDB
expand_fast_index_creation = 1
expire_logs_days = 7
innodb_autoinc_lock_mode = 2
innodb_buffer_pool_instances = 16
innodb_buffer_pool_populate = 1
innodb_buffer_pool_size = 32G   # XXX 64GB RAM, 80%
innodb_data_file_path = ibdata1:64M;ibdata2:64M:autoextend
innodb_file_format = Barracuda
innodb_file_per_table
innodb_flush_log_at_trx_commit = 2
innodb_flush_method = O_DIRECT
innodb_io_capacity = 1600
innodb_large_prefix
innodb_locks_unsafe_for_binlog = 1
innodb_log_file_size = 64M
innodb_print_all_deadlocks = 1
innodb_read_io_threads = 64
innodb_stats_on_metadata = FALSE
innodb_support_xa = FALSE
innodb_write_io_threads = 64
log-bin = mysqld-bin
log-queries-not-using-indexes
log-slave-updates
long_query_time = 1
max_allowed_packet = 64M
max_connect_errors = 4294967295
max_connections = 4096
min_examined_row_limit = 1000
port = 3306
relay-log-recovery = TRUE
skip-name-resolve
slow_query_log = 1
slow_query_log_timestamp_always = 1
table_open_cache = 4096
thread_cache = 1024
tmpdir = /db/tmp
transaction_isolation = REPEATABLE-READ
updatable_views_with_limit = 0
user = mysql
wait_timeout = 60
#
# Galera Variable config 
wsrep_cluster_address = gcomm://ip_1, ip_2, ip_3,ip_4,ip_4,ip_5
wsrep_cluster_name = cluster_db
wsrep_provider = /usr/lib/libgalera_smm.so
wsrep_provider_options = "gcache.size=4G"
wsrep_slave_threads = 32
wsrep_sst_auth = "user:password"
wsrep_sst_donor = "db1"
#wsrep_sst_method = xtrabackup_throttle
wsrep_sst_method = xtrabackup-v2
#
# XXX You *MUST* change!
server-id = 1

Can you post the query? SELECT queries only execute on a single node but all write queries will execute everywhere. What's in your error log?

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM