简体   繁体   English

有没有办法在具有单个使用者的 kafka 主题的分区之间维护消息排序?

[英]Is there any way to maintain message ordering between partitions of a kafka topic with a single consumer?

We are developing a kafka based streaming system in which the producer would produce to multiple partitions within its topic and a single consumer would consume from the topic.我们正在开发一个基于 kafka 的流系统,其中生产者将在其主题内的多个分区中生产,而单个消费者将从主题中消费。 I know that kafka maintains message order within partitions, but can we maintain a global message order between partitions within a topic?我知道 kafka 在分区内维护消息顺序,但是我们可以在主题内的分区之间维护全局消息顺序吗?

Short answer: no, Kafka does not provide any ordering guarantees between partitions. 简短回答:不,Kafka不提供分区之间的任何订购保证。

Long answer: I don't quite understand your problem. 答案很长:我不太明白你的问题。 If you are saying you have only one consumer consuming your topic, why would you have more than 1 partition in that topic and reinvent the wheel trying to maintain order between partitions? 如果您说您只有一个消费者在使用您的主题,那么为什么您在该主题中有多个分区并重新发明轮子以尝试维护分区之间的顺序? If you want to leave some space for future growth, eg adding another consumer to consume a part of partitions, then you'll have to rethink your "global message order" idea. 如果你想为将来的增长留出一些空间,例如添加另一个消费者来消耗部分分区,那么你将不得不重新考虑你的“全局消息顺序”的想法。

Do you really need ALL messages to be processed in order? 你真的需要按顺序处理所有消息吗? Or maybe you could partition by client/application/whatever and maintain order per partition? 或者您可以按客户端/应用程序/任何分区进行分区并维护每个分区的顺序? In most cases you don't really need that global message order, but just have to partition your data properly. 在大多数情况下,您并不真正需要全局消息顺序,但只需正确分区数据。

Maintaining order between multiple consumers is a really tough problem to solve, and even if solved correctly you'll just neglect all Kafka benefits. 维持多个消费者之间的秩序是一个非常难以解决的问题,即使正确解决,你也会忽略所有卡夫卡的好处。

You can't benifit from kafka if you want the global ordering in more than one partition. 如果您想在多个分区中进行全局排序,则无法从kafka获益。 Kafka only supports message ordering in only one partition. Kafka仅支持仅在一个分区中进行消息排序。 In our company, we need only the same catergory messages are sent to the same partition, which can easily partition using partitionId. 在我们公司,我们只需要将相同的catergory消息发送到同一个分区,这可以使用partitionId轻松进行分区。

The purpose of partitions in Kafka is to create a partial order of messages in a broader topic, where the messages follow a strict total order in any given partition. Kafka 中分区的目的是在更广泛的主题中创建消息的部分顺序,其中消息在任何给定分区中都遵循严格的总顺序。 So the answer is 'no', it would defeat the purpose of partitions if any notion of cross-partition order were to be introduced.所以答案是否定的,如果引入任何跨分区顺序的概念,它就会违背分区的目的。

I would suggest instead focusing on how messages (records, in Kafka parlance) are keyed, which effectively determines how they are mapped to a partition.我建议改为关注消息(用 Kafka 的说法是记录)是如何键控的,这有效地决定了它们如何映射到分区。 Which partition specifically doesn't matter, as long as the mapping is deterministic and repeatable — all you should care about is that identically keyed records will always appear on the same partition and, hence, will not be assigned to multiple consumers at the same time (within the same consumer group).具体哪个分区并不重要,只要映射是确定性和可重复的——你应该关心的是相同键的记录将始终出现在同一个分区上,因此不会同时分配给多个消费者(在同一个消费者组内)。

If you are publishing updates to persisted entities, the primary key of the entity is typically a good starting point for a Kafka record key.如果您要向持久化实体发布更新,则实体的主键通常是 Kafka 记录键的良好起点。 If there needs to be some order of updates across a connected graph of entities, then taking the ID root of the graph and making it the key will likely satisfy your ordering needs.如果需要在连接的实体图上进行某种更新顺序,那么获取图的 ID 根并将其设为键可能会满足您的排序需求。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM