简体   繁体   English

跨Kafka分区对消息进行排序并将其放入另一个Kafka主题中

[英]Sort messages across Kafka partitions and put it in another Kafka topic

I have a Kafka topic with X partitions.我有一个带有X分区的 Kafka 主题。 Each message has a timestamp, ts .每条消息都有一个时间戳ts Can someone suggest me some way of sorting all the messages (based on ts ) across all partitions and putting it in a new topic with Y partitions ( Y < X where Y can be 1 also)?有人可以建议我通过某种方式对所有分区中的所有消息(基于ts )进行排序,并将其放入具有Y分区的新主题中( Y < X ,其中Y也可以是 1)?

During this operation, no new data will be added to the original Kafka topic.在此操作期间,不会向原始 Kafka 主题添加新数据。 I am trying to avoid buffering all data to a temporary data store to sort.我试图避免将所有数据缓冲到临时数据存储进行排序。 So basically I am looking for a X-Way merge on streaming data.所以基本上我正在寻找对流数据的X-Way merge

Can someone let me know if this is possible to do efficiently in java using Kafka Streams API?有人可以让我知道这是否可以使用 Kafka Streams API 在 Java 中有效地完成?

This is my best suggestion based on my last experience, since you do not want to buffer all in one place, you can take a time interval say 30 mins, so you pull all data from partitions until you are getting data within that time frame say 9.00 am to 9.30 AM and sort this and put to the target, next you start pulling next data which would be from 9:30 AM on-wards.根据我上次的经验,这是我最好的建议,因为您不想在一个地方缓冲所有内容,您可以设置一个时间间隔,比如 30 分钟,这样您就可以从分区中提取所有数据,直到在该时间范围内获取数据为止上午 9 点到 9 点 30 分,将其排序并放入目标,接下来开始提取下一个数据,这些数据将从上午 9 点 30 分开始。 Although after 9:30 data there is a possibility u may get 9:27 data due to delays in your data and hence after processing this batch you have possibility of one data point of 9:29 and another of 9:27 however you will find that all of the data between 9:10 to 9:20 is sorted.尽管在 9:30 数据之后,由于数据延迟,您可能会获得 9:27 数据,因此在处理此批次后,您可能会得到一个 9:29 的数据点和另一个 9:27 的数据点,但是您会发现9:10 到 9:20 之间的所有数据都已排序。 Now the higher the time frame you can take more is the accuracy.现在,您可以采用的时间范围越高,准确性就越高。 If you need 100% sorting u may have to iterate on this target data again with different data frame to sort further.如果您需要 100% 排序,您可能需要使用不同的数据框再次迭代此目标数据以进一步排序。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM