简体   繁体   中英

Crash Fault Tolerance via Heartbeat

I get the concept of Crash Fault Tolerance (CTF) in theory. CTF is used to guarantee that the system is still running even if the leader server is crashing. I need to implement a distributed system (chat application) and also need to implement a crash fault tolerance. For this I have to use so-called "heartbeat" to check if the leader server is still "living".

My question is if someone could show me a good code example to implement such a heartbeat?

The heartbeat mechanism applicability depends on the size of cluster or the typical use-case / deployment scenario you have in hand.

Many consensus based algorithms rely on heartbeats as that is used to decide on the status of leader or leader server.

The raft algorithm can be referred where heartbeats are sent from leader server to followers and you can also use their leader election mechanism in case leader crashes.

For large clusters, only heartbeat mechanism might not scale and hence failure detectors along with gossip based protocols for propagation is preferred that can be referred.

Few references: https://raft.github.io/ , https://github.com/topics/gossip-protocol

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM