简体   繁体   中英

Zookeeper multiple leader election issue

I have a distributed application that uses ZooKeeper for leader election. Only the elected leader can commit to the database. I recently discovered that there is a potential situation which could lead to multiple leaders. The situation arises when the elected leader is paused for a long GC and can lose the heartbeat to the ZooKeeper, leading to the election of a new leader. At this point, both the nodes think themselves to be the leader and can lead to conflict.

Any suggestions on how to avoid such situation ?

When you use ZooKeeper for leader election you can't guarantee uniqueness of the leader .It's possible to run into this situation even without GC pauses. For example, when a leader is isolated from the ZooKeeper quorum during a network partitioning or when a leader issues a long running query, dies and a new leader can issue a new query while the current is still active.

The workaround is to use compare-and-set when you update the database. Once new leader is elected you should get an increasing leader id (eg by updating a node in ZooKeeper and using its version or mzxid) and use it to guard each transaction issued by that leader.

For example if you want to change the state of the db then instead of the following transaction:

BEGIN TRANSACTION;
db.update($change);
END TRANSACTION;

you should use something like

BEGIN TRANSACTION;
if (db.leaderID <= $leaderID) {
    db.leaderID = $leaderID;
    db.update($change);
}
END TRANSACTION;

This trick will protect your system from uncertainties caused by concurrent leaders. Of course your database should be linearizable and support compare-and-set.

To correct one of the answers, Zookeeper does guarantee leader uniqueness on network partitioning with quorum-based consistency. Upon a network partitioning, if a leader is isolated from a quorum, it will lose its leadership due to incapable of connecting to a quorum of nodes. In the meanwhile, a new leader is elected in the other partition. For the same reason, the partition where the old leader is in is unable to elect a new leader. The situation is resolved after the network partition is recovered by issuing a new leader election.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM