简体   繁体   English

单个生产者、主题和代理的 Kafka 分区

[英]Kafka Partitioning for a Single Producer, Topic and Broker

I am quite new to Kafka, and have a question with regards to the relationship/mapping between Producer, Topic, Broker and Partition in the case where I have a single Producer, single Topic and a single Broker, does it make sense to create multiple partitions for the Topic here on the single Broker here?我对 Kafka 很陌生,并且有一个关于 Producer、Topic、Broker 和 Partition 之间的关系/映射的问题,在我有一个 Producer、单个 Topic 和一个 Broker 的情况下,创建多个是否有意义?这里的单个 Broker 上的 Topic 分区? If yes, how does this help in terms of parallelism/performance?如果是,这对并行性/性能有何帮助?

Thanks.谢谢。

Even if you have a single Producer, single Topic and a single Broker, it makes sense to create multiple partitions for the Topic in terms of parallelism/performance in the context of consumers.即使您只有一个生产者、一个 Topic 和一个 Broker,在消费者上下文中,就并行性/性能而言,为该主题创建多个分区也是有意义的。 If you have multiple consumers in a single consumer group and multiple partitions in the topic, then it is guaranteed that consumers will receive data from different partitions which will give you parallelism and performance boost while processing from kafka.如果您在单个消费者组中有多个消费者并且主题中有多个分区,那么可以保证消费者将从不同的分区接收数据,这将在从 kafka 处理时为您提供并行性和性能提升。

First thing to understand is that a topic partition is a unit of parallelism in Kafka Cluster.首先要理解的是,主题分区是 Kafka 集群中的一个并行单元。 On both Producer and Broker, the writes are happening in parallel so that you can perform expensive operations (compression etc), and at the consumer end each partition data is given to a single consumer thread.在 Producer 和 Broker 上,写入是并行发生的,因此您可以执行昂贵的操作(压缩等),并且在消费者端,每个分区数据都被提供给单个消费者线程。

In your scenario you would be benefited if you are having multiple partitions on a topic and these multiple partitions being consumed by multiple consumers within a single consumer group.在您的场景中,如果您在一个主题上有多个分区,并且这些多个分区由单个消费者组中的多个消费者使用,您将受益。 That way you can achieve maximum throughput in your application.这样您就可以在您的应用程序中实现最大吞吐量。 If you only use a single consumer thread for multiple partitions it would be of no use.如果您仅对多个分区使用单个使用者线程,则没有用。 Basically More partitions could lead to Higher throughput if you manage your cluster resources cleverly.如果您巧妙地管理集群资源,基本上更多的分区可能会导致更高的吞吐量。

In addition to the previous answers it is important to remember that consuming from multiple partitions does not preserve the order of the messages/events.除了前面的答案之外,重要的是要记住从多个分区进行消费不会保留消息/事件的顺序。 You might have to consider this fact if your application depends on the correct order of messages.如果您的应用程序依赖于正确的消息顺序,您可能必须考虑这一事实。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM