简体   繁体   中英

Kafka min.insync.replicas < replication.factor

Suppose I have a cluster with 3 kafka brokers. I set:

min.insync.replicas=2
default.replication.factor=3
  • All brokers are up, ISR is fine, I get a message where ack=all . Since ISR=2 , two copies of the message are for sure stored. 1) Will one more copy (because replication=3) be made in the background? 2) If it fails - it does not matter, correct? Cluster health is just fine.

  • One broker is down, ISR=2 can be maintained and the message is saved to two brokers. After some time that broker that was down comes up again. 3) Since replication=3, will it try to catch up with the others in the back-ground?

I am trying to figure out of a practical example where setting replication factor to be bigger than ISR would make sense. A real example I could "touch" and understand. If this is a duplicate, please refer me to it. Thank you.

Yes, one replica is made in the background.

Yes, the broker will catch up all out of sync replicas upon restarts.

If you ever have in-sync replicas <= replication factor, then you cannot lose any brokers (due to maintenance or failure). Therefore, replication factor should always be greater

The other answer is absolutely correct, but it took me quite a while to figure out. imho, this is somehow subtle and though my understanding might be a little incorrect here and there, it helped to build a mental model of what is going on.

Suppose I have a cluster of 3 brokers:

[a, b, c]  ->  brokers
[a, b]     -> ISR
[a, b, c]  -> RF

How many brokers can I tolerate to be down? The answer is 1.

  • If lose broker "c", ISR can still be satisfied and the cluster will work just fine.

  • If I lose broker "a" (the explanation is the same if I lost "b"), a rebalance has to happen. zookeeper will ask what brokers were in-sync (who satisfied RF) before I lost one from the ISR. Well, there were 3 of them part of RF = a, b, c. Since I lost "a", there are two left now that are in sync: "b" and "c". A leader election has to happen and the ISR will be satisfied with "b" and "c".

  • This means that I can lose any one broker from the cluster and still work fine. It might be trivial here, but the next example is not so much, imho.


Suppose I have a (artificial example) cluster with 5 brokers:

[a, b, c, d, e]  -> brokers
[a, b]           -> ISR
[a, b, c]        -> RF

How many brokers can I tolerate as being down now? Initially I thought 2, but that can't be correct.

  • If I lose "d" and "e", it's simple, the cluster will continue to work just fine.

  • If lose "a" and "b", in theory a rebalance has to happen. But what brokers were part of RF before I lost "a" and "b" or which brokers were in-sync? [a, b, c]. There is no way to satisfy ISR if two of those brokers are down.

  • This means that I can't tolerate any two brokers being down, which means this set-up is not really fault tolerant with any 2 brokers down.

  • It can only be tolerant with two brokers down if my set-up is different:

     5 -> brokers 3 -> ISR 5 -> RF

And this is where the other answer is correct and makes total sense:

If you ever have in-sync replicas <= replication factor, then you cannot lose any brokers more than the difference between the values

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM