简体繁体中英

How to scale Kafka stream processing dynamically?

原文 2019-05-16 18:21:57 5 1 java/ python/ apache-kafka/ kafka-consumer-api

I have a fixed number of partitions of a topic. Producers produce data at varying rate in different hours of the day.

I want to add consumers dynamically based on hours of the day for the processing so that I can process records as fast as I can.

For example I have 10 partitions of a topic. I want to deploy 5 consumers for non peak hours and 20 consumers for peak hours.

My problem is that when I will have 20 consumers, each consumer will be receiving duplicate records, which I want to avoid. I want to process unique records only to speed-up records processing.

Is there any mechanism to do this?

1 answers

If you have N partitions, then you can have up to N consumers within the same consumer group each of which reading from a single partition. When you have less consumers than partitions, then some of the consumers will read from more than one partition. Also, if you have more consumers than partitions then some of the consumers will be inactive and will receive no messages at all.

Therefore, if you want to kick off 20 consumers, you need to increase the number of partitions of the topic to -at least- 20 otherwise, 10 of your consumers will be inactive.

With regards to the duplicates that you've mentioned, if all of your consumers belong to the same group, then each message will be consumed only once.

To summarise,

Increase the number of partitions of your topic to 20.
Create the mechanism that will be creating and killing consumers based on peak/off-peak hours and make sure that when you kick of a consumer, it belongs to the existing consumer group so that the messages are consumed only once.

How to get the processing kafka topic name dynamically in Flink Kafka Consumer?

How can I pause and resume stream processing periodically(every 5 minutes) using Kafka Streams and Spring Kafka Streams?

How to make Spring cloud stream Kafka streams binder retry processing a message if a failure occurs during the processing step?

How can I pause (turn on/off) stream processing w/ Spring Cloud Stream & Kafka Streams Binder?

Apache Flink Kafka Stream Processing based on conditions

How to scale kafka message consumption when consumers create bottleneck (high message processing time)?

Kafka Streams | How to add the “to” topic dynamically on the basis of content received over stream

Reprocessing a message when an error occurs while processing it in the Kafka Stream

How to compare stream records in Kafka

How to reprocess a batched Kafka Stream

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question How to get the processing kafka topic name dynamically in Flink Kafka Consumer? How can I pause and resume stream processing periodically(every 5 minutes) using Kafka Streams and Spring Kafka Streams? How to make Spring cloud stream Kafka streams binder retry processing a message if a failure occurs during the processing step? How can I pause (turn on/off) stream processing w/ Spring Cloud Stream & Kafka Streams Binder? Apache Flink Kafka Stream Processing based on conditions How to scale kafka message consumption when consumers create bottleneck (high message processing time)? Kafka Streams | How to add the “to” topic dynamically on the basis of content received over stream Reprocessing a message when an error occurs while processing it in the Kafka Stream How to compare stream records in Kafka How to reprocess a batched Kafka Stream

Related Tags

How to scale Kafka stream processing dynamically?

Question

1 answers

solution1 0 2019-05-16 19:19:48

solution1
0 2019-05-16 19:19:48