简体   繁体   中英

Kafka 3.3.1 Active/Active consumers and producers

We have 2 diff kafka clusters with 10 brokers in each cluster and each cluster has its own Zookeeper cluster. We also have setup MirrorMaker 2 to sync data between the clusters. With MM2, the offset is also being synced along with data.

Looking forward to setup Active/Active for my consumer application as well as producer application.

Lets say the clusters are DC1 & DC2. Topic name is test-mm.

With MM2 setup,

In DC1,
  test-mm
  test-mm-DC2(Mirror of DC2)

In DC2,
  test-mm
  test-mm-DC1(Mirror of DC1)

Consumer Active/ Active

In DC1 , I have an application consuming data from test-mm & test-mm-DC2 with the consumer group name group1-test .

In DC2 , The same application is consuming data from test-mm & test-mm-DC1 with the consumer group name group1-test .

Application is running as Active/Active on both DCs.

Now producer in DC1 is producing to the topic test-mm in DC1 and it gets mirrored to the topic test-mm-DC1 in DC2. My assumption here is, the offset gets synced so, with the same consumer group name, we can run consumer application on both DCs and only one consumer will get and process the message. Also, when the consumer application in DC1 goes down, the consumer application in DC2 will start processing and we can achieve the real active/active for consumers. Is this correct?

Producer active/active ,

It may not be possible with Producer in DC1 and Producer 2 in DC2 as the sequence may not be maintained with 2 different producer. Not sure if Active/Active can be achieved with producer.

You will want two producers, one producing to test-mm in DC1 and the other producing to test-mm in DC2. Once messages have been produced to test-mm in DC1 this will be replicated to test-mm-DC1 in DC2 and vice versa. This is achieving active / active as the data will exist on both DCs, your consumers are also consuming from both DCs and if one DC fails the other producer and consumer will continue as normal. Please let me know if this has not answered your question.

Hopefully my comment answers your question about exactly once processing with MM2. The Stack Overflow post I linked takes the following paragraph from the IBM guide: https://ibm-cloud-architecture.github.io/refarch-eda/technology/kafka-mirrormaker/#record-duplication

This Cloudera blog also mentions that exactly once processing does not apply across multiple clusters: https://blog.cloudera.com/a-look-inside-kafka-mirrormaker-2/

Cross-cluster Exactly Once Guarantee

Kafka provides support for exactly-once processing but that guarantee is provided only within a given Kafka cluster and does not apply across multiple clusters. Cross-cluster replication cannot directly take advantage of the exactly-once support within a Kafka cluster. This means MM2 can only provide at least once semantics when replicating data across the source and target clusters which implies there could be duplicate records downstream.

Now with regards to the below question:

Now producer in DC1 is producing to the topic test-mm in DC1 and it gets mirrored to the topic test-mm-DC1 in DC2. My assumption here is, the offset gets synced so, with the same consumer group name, we can run consumer application on both DCs and only one consumer will get and process the message. Also, when the consumer application in DC1 goes down, the consumer application in DC2 will start processing and we can achieve the real active/active for consumers. Is this correct?

See this post here, they ask a similar question: How are consumers setup in Active - Active Kafka setup

I've not configured MM2 in an active/active architecture before so can't confirm whether you would have two active consumers for each DC or one. Hopefully another member will be able to answer this question for you.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM