简体   繁体   English

MTS复制的死锁

[英]Deadlock on MTS replication

Situation: 情况:

we have a master-master-replication using GTIDs on Percona MySQL 5.6.32-78.1. 我们在Percona MySQL 5.6.32-78.1上使用GTID进行主 - 主复制。 On server, there are about 10 databases and we've set slave_parallel_workers=5 . 在服务器上,大约有10个数据库,我们设置了slave_parallel_workers=5 One server is used for frontend handling and one for backend. 一台服务器用于前端处理,一台用于后端。 Two or three times a week, the replication on the backend server dies with error 每周两到三次,后端服务器上的复制会因错误而死亡

2016-10-25 10:00:01 165238 [Warning] Slave SQL: Worker 4 failed executing transaction '0e7b97a8-a689-11e5-8b79-901b0e8b0f53:22506262' at master log mysql-bin.011888, end_log_pos 9306420; Could not execute Update_rows event on table shop.sessions; Deadlock found when trying to get lock; try restarting transaction, Error_code: 1213; handler error HA_ERR_LOCK_DEADLOCK; the event's master log mysql-bin.011888, end_log_pos 9306420, Error_code: 1213 2016-10-25 10:00:01 165238 [ERROR] Slave SQL: ... The slave coordinator and worker threads are stopped, possibly leaving data in inconsistent state. A restart should restore consistency automatically, although using non-transactional storage for data or info tables or DDL queries could lead to problems. In such cases you have to examine your data (see documentation for details). Error_code: 1756 2016-10-25 10:00:01 165238 [Note] Error reading relay log event: slave SQL thread was killed

What could be the reason? 可能是什么原因? There are no cross-database DML statements and I thought by using MTS, only one thread is used per database (the benefit of MTS is using parallel replication across several databases)? 没有跨数据库DML语句,我认为通过使用MTS,每个数据库只使用一个线程(MTS的好处是在多个数据库中使用并行复制)? Why does a repliation break with a deadlock? 为什么复制打破僵局?

EDIT 2016-10-28: 编辑2016-10-28:

Schema of table looks like 表的模式看起来像

CREATE TABLE `sessions` (
  `id` int(11) NOT NULL,
  `session_id` char(40) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL,
  `crypt_iv` blob NOT NULL,
  `data` mediumblob NOT NULL,
  `user_id` int(11) NOT NULL,
  `last_refresh` datetime NOT NULL,
  `timeout` datetime NOT NULL,
  `closed` tinyint(4) NOT NULL,
  `inserted` datetime NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
ALTER TABLE `sessions`
  ADD PRIMARY KEY (`id`),
  ADD UNIQUE KEY `session_id` (`session_id`),
  ADD KEY `user_id` (`user_id`),
  ADD KEY `timeout` (`timeout`);
ALTER TABLE `sessions` MODIFY `id` int(11) NOT NULL AUTO_INCREMENT;

At time this error has only happened on backend side, never on frontend server. 此时此错误仅发生在后端,而不是发生在前端服务器上。 At the moment I cannot paste the exact statement as binary logs are purged. 目前我无法粘贴确切的语句,因为二进制日志被清除。 But the only statement inside this GTID transaction is a row-based UPDATE on the table. 但是这个GTID事务中唯一的声明是表上的基于行的UPDATE。

I guess all sessions are created on the frontend server. 我猜所有会话都是在前端服务器上创建的。 Is there maybe a session cleanup job on the backend server? 后端服务器上是否可能有会话清理作业? So you have writes on the table from both machines. 所以你在两台机器上都写了一些表。 If you have a write heavy table as sessions you should only write it on one machine to avoid this kind of deadlocks. 如果你有一个写重表作为会话,你应该只在一台机器上写它,以避免这种死锁。

Actually you should always do all writes on one machine only, except for failover cases, when one master goes down. 实际上,除了故障转移情况,当一个主机发生故障时,您应该始终只在一台机器上执行所有写操作。

There are nice setups with haproxy and health checks to have the failover handled automatically and transparent for your clients. 通过haproxy和运行状况检查可以很好地设置故障转移,以便为客户自动和透明地处理故障转移。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM