简体   繁体   English

Kafka中主要队列和优先级队列的数据排序

[英]ordering of data in Kafka for primary and priority queue

I need to process product id's in order and is planning to use kafka for this, now in case of data loss from kafka or my code i am having all these product-id's in may database so if record is not processed in a given time lets say 24 hours I need to republish them in a queue but in priority manner as kafka does not have priority concept for data in queue I can have another queue that can act as priority queue. 我需要按顺序处理产品ID,并计划为此使用kafka,现在,如果由于kafka或我的代码而导致数据丢失,我会将所有这些产品ID包含在may数据库中,因此如果在给定时间内未处理记录,例如,我需要在24小时内将它们重新发布到队列中,但是要以优先级方式进行发布,因为kafka对队列中的数据没有优先级概念,所以我可以拥有另一个可以充当优先级队列的队列。

Problem I am facing is I need ordering of products in priority queue as well. 我面临的问题是我也需要在优先队列中订购产品。 So if I have distributed in partitions based on hash and my consumers again process messages in order ie, maintaining queue for each thread in consumer and based on hash of product-id I can main distribute data among these in memory queue. 因此,如果我已经基于散列分配了分区,并且我的使用者再次按顺序处理消息,即为使用者中的每个线程维护队列,并基于product-id的散列,则可以在存储器队列中主要在其中分配数据。 But in case of 2 queues, 1 as primary queue and 1 priority I ordering among these queues as well. 但是在2个队列的情况下,其中1个作为主要队列,另外1个优先级在这些队列中排序。 So data from both queue should go to the same consumer so that I can maintain ordering in my code. 因此,两个队列中的数据都应传递给同一使用者,以便我可以保持代码中的顺序。

Please suggest if I am on the wrong track or how should I proceed. 请提出建议,如果我走错了路或应该如何进行。

It can be done if you need. 如果需要,可以完成此操作。 You will not loose data in kafka if you have reasonable retention policy and replication factor. 如果您具有合理的保留策略和复制因子,则不会丢失kafka中的数据。


Still how to do it: 仍然如何做:

1.Setup: 1.设置:

You can have two topics. 您可以有两个主题。 Let us call them normal and priority. 让我们称它们为正常和优先。 You have the same amount of partitions on both, let us have 4. You have the same partitioning strategy on both, let say product id mod 4. 您在两个分区上具有相同数量的分区,让我们拥有4。您在两个分区上具有相同的分区策略,例如product id mod 4。

2.Producer: 2.制作人:

Now you have the event with your product of id 3. It is sent to the normal topic partition 3. You don't receive it for any reason. 现在,您有了产品ID为3的事件。该事件被发送到常规主题分区3。由于任何原因您都不会收到它。 You republish the event about the same product on the priority topic now using the same logic so it goes to the same partition. 您现在使用相同的逻辑在优先级主题上重新发布有关同一产品的事件,以便将其分配到同一分区。

3.Consumer: 3,消费者

On the consumer side you have to manually assign to the specific partitions. 在使用者方面,您必须手动分配给特定分区。 Let say you have two consumers. 假设您有两个消费者。 You just assign the first one partitions 0 and 1 for both topics (and listen to both topics). 您只需为两个主题分配第一个分区0和1(并聆听两个主题)。 The second is assigned the remaining both partitions 2 and 3. 第二个分配给其余的两个分区2和3。

I hope this answers you question. 我希望这能回答您的问题。 Good luck! 祝好运!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM