[英]Kafka-streams: Why do all partitions get assigned to the same consumer in the consumergroup?
Background 背景
Several machines generate events. 多台机器生成事件。 These events get sent to our Kafka cluster, where each machine has its own topic (app.machine-events. machine-name ).
这些事件将发送到我们的Kafka集群,其中每台机器都有自己的主题(app.machine-events。machine -name )。 Because order is important on a per-machine basis, and partition-size is not an issue for now, all topics consist of a single partition.
因为顺序对于每台计算机而言很重要,并且分区大小现在不是问题,所以所有主题都由一个分区组成。 Therefore N topics also means N partitions, currently.
因此,当前N个主题也意味着N个分区。
The consuming/processing app makes use of kafka-streams, which we've given the StreamsConfig.APPLICATION_ID_CONFIG
/ "application.id"
'machine-event-processor', which remains the same for each instance, meaning they get put into the same consumer group for Kafka. 消费/处理应用程序利用kafka-streams,我们为它提供了
StreamsConfig.APPLICATION_ID_CONFIG
/ "application.id"
'machine-event-processor',对于每个实例它都保持不变,这意味着它们被放入同一实例中卡夫卡的消费群体。 This consumer is subscribed to the pattern app.machine-events.*
, as for the processor it does not matter which machine's events it processes. 该使用者已订阅模式
app.machine-events.*
,因为对于处理器,它处理的是哪个机器的事件都没有关系。 This is verified by ./kafka-consumer-groups.sh --bootstrap-server localhost:9092 --describe --group machine-event-processor --members --verbose
showing me a list matching the number of & IPs of all processing services running. 这已经通过
./kafka-consumer-groups.sh --bootstrap-server localhost:9092 --describe --group machine-event-processor --members --verbose
显示了一个与所有IP数量匹配的列表处理服务正在运行。
Expected 预期
Given 20 machines and 5 instances of the processor, we'd expect each processor to handle ~4 partitions (and therefore ~4 topics). 给定20台计算机和5个处理器实例,我们希望每个处理器处理约4个分区(因此约有4个主题)。
Actually 其实
There's one processor handling 20 partitions (and therefore 20 topics), with 4 other processors doing nothing at all/idling. 有一个处理器处理20个分区(因此有20个主题),而其他4个处理器则完全不执行任何操作/空闲。 Killing the 'lucky' processor, all 20 partitions get rebalanced to another processor, resulting in the new processor handling 20 partitions/topics, and 3 processors idling.
杀死“幸运”处理器后,所有20个分区都重新平衡到另一个处理器,导致新处理器处理20个分区/主题,并使3个处理器空闲。
What I've tried so far 到目前为止我尝试过的
settings.put(StreamsConfig.consumerPrefix(ConsumerConfig.PARTITION_ASSIGNMENT_STRATEGY_CONFIG), new RoundRobinAssignor().getClass.getName)
(Tried several values, as nothing seems to change.) settings.put(StreamsConfig.consumerPrefix(ConsumerConfig.PARTITION_ASSIGNMENT_STRATEGY_CONFIG), new RoundRobinAssignor().getClass.getName)
(尝试了多个值,因为似乎没有什么变化。) The code, simplified 代码,简化
val streamConfig = new Properties
// {producer.metadata.max.age.ms=5000, consumer.metadata.max.age.ms=5000, default.key.serde=org.apache.kafka.common.serialization.Serdes$StringSerde, consumer.partition.assignment.strategy=org.apache.kafka.clients.consumer.RoundRobinAssignor, bootstrap.servers=kafka:9092, application.id=machine-event-processor, default.value.serde=org.apache.kafka.common.serialization.Serdes$ByteArraySerde}
val builder: StreamsBuilder = new StreamsBuilder
val topicStream: KStream[String, Array[Byte]] = builder.stream(Pattern.compile("app.machine-events.*"))
topicStream.process(new MessageProcessorSupplier(context)) // The event is delegated to a processor, doing the actual processing logic
val eventStreams = new KafkaStreams(builder.build(), streamConfig)
eventStreams.start()
Notes 笔记
Kafka-streams 2.0.0 is being used: 使用Kafka-streams 2.0.0:
<dependency> <groupId>org.apache.kafka</groupId> <artifactId>kafka-streams</artifactId> <version>2.0.0</version> </dependency>
Kafka is being run inside a container, using the wurstmeister/kafka:2.11-2.0.0
version. Kafka正在使用
wurstmeister/kafka:2.11-2.0.0
版本在容器内运行。 The docker-stack.yml service: docker-stack.yml服务:
kafka: image: wurstmeister/kafka:2.11-2.0.0 ports: - target: 9094 published: 9094 protocol: tcp mode: host volumes: - /var/run/docker.sock:/var/run/docker.sock healthcheck: test: ["CMD-SHELL", "$$(netstat -ltn | grep -q 9092)"] interval: 15s timeout: 10s retries: 5 environment: HOSTNAME_COMMAND: "docker info | grep ^Name: | cut -d' ' -f 2" KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181 KAFKA_ZOOKEEPER_CONNECTION_TIMEOUT_MS: 36000 KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: INSIDE:PLAINTEXT,OUTSIDE:PLAINTEXT KAFKA_ADVERTISED_LISTENERS: INSIDE://:9092,OUTSIDE://_{HOSTNAME_COMMAND}:9094 KAFKA_LISTENERS: INSIDE://:9092,OUTSIDE://:9094 KAFKA_INTER_BROKER_LISTENER_NAME: INSIDE KAFKA_DEFAULT_REPLICATION_FACTOR: 2 deploy: replicas: 2 restart_policy: condition: on-failure delay: 5s max_attempts: 3 window: 120s
2
, so each partition should have a replication on each node. 2
,因此每个分区应该在每个节点上都有一个复制。 Relevant topics/questions/discussions I've found and checked 我找到并检查过的相关主题/问题/讨论
https://faust.readthedocs.io/en/latest/developerguide/partition_assignor.html https://faust.readthedocs.io/en/latest/developerguide/partition_assignor.html
Checked out the Kafka mail archives but did not find anything there 检查了Kafka 邮件档案,但没有找到任何东西
Checked out stream example apps 签出流示例应用
All-round searching for others that ran into this issue, but did not give me the answers I need. 全面搜索遇到此问题的其他人,但没有给我我所需的答案。 Also found KAFKA-7144 but this should not be an issue for us as we're running 2.0.0
还发现了KAFKA-7144,但这对我们来说不是问题,因为我们正在运行2.0.0
If anyone has run into similar issues, or is able to point out my probably very stupid mistake, please enlighten me! 如果有人遇到类似问题,或者能够指出我可能非常愚蠢的错误,请赐教!
For future readers running into this same issue, the solution was to not use N topics each having 1 partition, but using 1 topic with N partitions. 对于将来遇到此问题的读者,解决方案是不使用每个都有1个分区的N个主题,而使用带有N个分区的1个主题。 Even with, say, 120 partitions and 400+ machines/event-sources, multiple event types will be put into the same partition, but this does not affect order of the events.
即使有120个分区和400多个计算机/事件源,也将多个事件类型放入同一个分区,但这不会影响事件的顺序。
The implementation was to set the record key to the machine-name, and letting the underlying logic take care of which value goes to which partition. 实现是将记录键设置为计算机名称,并让底层逻辑负责将哪个值分配给哪个分区。 Since we now have a consumer-group with X consumers subscribed to this topic, the partitions are being divided over the consumers evenly, each taking 120/X partitions.
由于我们现在有一个消费者组,其中有X个消费者订阅了该主题,因此分区在消费者上平均分配,每个分区有120 / X个分区。
This was as Matthias suggested, which was further confirmed by other helpful people from Confluent at Devoxx Belgium 2018. Thank you! 正如Matthias所建议的那样,Confluent的其他乐于助人的人在Devoxx Belgium 2018上进一步证实了这一点。谢谢!
Tip 小费
When using the wurstmeister/kafka docker image, consider using the environment property: 使用wurstmeister / kafka docker映像时,请考虑使用environment属性:
KAFKA_CREATE_TOPICS: "app.machine-events:120:2"
KAFKA_CREATE_TOPICS:“ app.machine-events:120:2”
meaning 含义
topic-name:number-of-partitions:replication-factor
主题名称:数的-分区:复制因子
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.