简体   繁体   English

Kafka:单消费者组,无分区多主题

[英]Kafka: Single consumer group, no partitions and multiple topics

I have 22 topics and ordering within a topic is important to me.我有 22 个主题,一个主题内的排序对我来说很重要。 I do not have any partitions.我没有任何分区。
Basically I have 11 tenants and I need two topics per tenant.基本上我有 11 个租户,每个租户需要两个主题。
I am confused about whether to have a single consumer group for all 22 topics or have 22 consumer groups?我很困惑是要为所有 22 个主题设置一个消费者组还是有 22 个消费者组?
The load is not much and the consumption is not real-time, it is an offline process, so a lag of a few millis won't hurt.负载不大,消耗不是实时的,是一个离线过程,所以几毫秒的延迟不会受到伤害。

I am confused about the following points:我对以下几点感到困惑:
1. If I have one consumer group with one consumer running on a single machine (JVM - Spring Boot Application), will the consumer work with all topics using a single thread or will there be separate thread per topic? 1. 如果我有一个消费者组,一个消费者在一机器上运行(JVM - Spring Boot 应用程序),消费者是使用单个线程处理所有主题,还是每个主题都有单独的线程? If it is a single thread, the thread may get overloaded.如果是单线程,线程可能会过载。 If there are multiple threads, I will be able to achieve parallelism(utilize all the cores) without spinning another machine.如果有多个线程,我将能够在不旋转另一台机器的情况下实现并行性(利用所有内核)。
2. If I have one consumer group listening to all topics with multiple consumers running on multiple machines (Multiple JVMs - Spring Boot Application), will the Zookeeper distribute the load from different topics to different machines? 2.如果我有一个消费者组监听所有主题,多个消费者运行在台机器上(Multiple JVMs - Spring Boot Application),Zookeeper会不会将来自不同主题的负载分配到不同的机器上? I understand that messages from one topic will always go to a single machine.我知道来自一个主题的消息将始终发送到一台机器。

For eg: If there are 2 consumers (one per machine), a single consumer group listening to all the 22 topics, and if the 22 topics produce messages simultaneously, will they be distributed among the 2 machines maybe something like messages from topic 1-11 goes to machine one and from topic 12-22 goes to machine two?例如:如果有 2 个消费者(每台机器一个),一个消费者组监听所有 22 个主题,如果 22 个主题同时产生消息,它们是否会分布在 2 台机器之间,可能类似于来自主题 1 的消息- 11 去机器一,主题 12-22 去机器二? I am just interested in load distribution.我只对负载分配感兴趣。

Does it work this way (assuming equal load from all topics)?它是否以这种方式工作(假设所有主题的负载相等)?
2 machines -> messages from approx 11 topics per machine 2 台机器 -> 每台机器大约 11 个主题的消息
4 machines -> messages from approx 5 topics per machine and so on. 4 台机器 -> 来自每台机器约 5 个主题的消息,依此类推。

First of all to clarify the concepts:首先明确一下概念:

  • Topic is just a logical unit.主题只是一个逻辑单元。
  • Messages are ordered only in partitions.消息仅在分区中排序。
  • "I do not have any partitions." “我没有任何分区。” is not possible.不可能。 A topic must have at least one partition.一个主题必须至少有一个分区。
  • Consumer group is used just for horizontal scalability.消费者组仅用于水平可扩展性。 If you have 5 partitions in your topic and 5 consumers within the same consumer group.如果您的主题中有 5 个分区,并且同一消费者组中有 5 个消费者。 Then Kafka assigns each partition to a consumer and consume process works in parallel.然后Kafka将每个分区分配给一个消费者,消费过程并行工作。

Answers to your questions:回答您的问题:

  1. If you have one consumer then there will be one thread (Kafka consumer is not thread safe), if you need paralellism you need to have more than one partition in topic and same number of consumers in the same consumer group.如果您有一个消费者,那么将有一个线程(Kafka 消费者不是线程安全的),如果您需要并行,则需要在主题中有多个分区,并且同一消费者组中有相同数量的消费者。 A consumer can subscribe multiple topics.一个消费者可以订阅多个主题。
  2. There is no use of Zookeeper in consumer side.消费者端没有使用Zookeeper。 (take a look at this ) But Kafka distribute partitions to consumers evenly. (看看这个)但是Kafka将分区平均分配给消费者。 Fair load distribution of partitions to consumers is guaranteed by Kafka in default.默认情况下,Kafka 保证向消费者公平地分配分区负载。

**Maybe this video can be helpful to understand some core concepts better. **也许这个视频可以帮助更好地理解一些核心概念。

will the consumer work with all topics using a single thread or will there be separate thread per topic?消费者将使用单个线程处理所有主题,还是每个主题都有单独的线程?

The answer is using a single thread because the KafkaConsumer documentation says:答案是使用单线程,因为KafkaConsumer文档说:

The Kafka consumer is NOT thread-safe. Kafka 消费者不是线程安全的。 All network I/O happens in the thread of the application making the call.所有网络 I/O 都发生在进行调用的应用程序的线程中。 It is the responsibility of the user to ensure that multi-threaded access is properly synchronized.确保多线程访问正确同步是用户的责任。 Un-synchronized access will result in ConcurrentModificationException.非同步访问将导致 ConcurrentModificationException。


If I have one consumer group listening to all topics with multiple consumers running on multiple machines ... will the Zookeeper distribute the load from different topics to different machines?如果我有一个消费者组在多台机器上运行多个消费者来监听所有主题...... Zookeeper 是否会将负载从不同主题分配到不同的机器?

Yes, even though, it's not Zookeeper the component responsible for this.是的,尽管如此,负责此操作的组件并不是 Zookeeper。

Just a note: Kafka doesn't know anything about machines, it knows about consumer groups and consumers.请注意:Kafka 对机器一无所知,它了解消费者群体和消费者。


Now, let's answer the main question.现在,让我们回答主要问题。

I am confused about whether to have a single consumer group for all 22 topics or have 22 consumer groups?我很困惑是要为所有 22 个主题设置一个消费者组还是有 22 个消费者组?

Since you have only one partition per topic, having 22 consumers with the same group.id or having 22 consumers each subscribed to only one topic is the same thing because:由于每个主题只有一个分区,因此具有相同group.id 22 个消费者或每个订阅一个主题的 22 个消费者是一回事,因为:

each partition is assigned to exactly one consumer in the group.每个分区被分配给组中的一个消费者。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM