简体   繁体   English

使用Kafka Streams DSL的两个Kafka主题的事件时间合并

[英]Event-Time merge of two Kafka topics using Kafka Streams DSL

I am looking for a way to merge two Kafka topics based on the event time. 我正在寻找一种基于事件时间合并两个Kafka主题的方法。

for example, I have two topics with the following schema {event-key}:: {event-time-as-value} 例如,我有两个主题,它们具有以下模式{event-key}:: {event-time-as-value}

topic I -  { {1 :: 12:00pm} {2 :: 12:10pm} {3 :: 14:50pm} {4 :: 15:00pm} }
topic II - { {1 :: 13:00pm} {2 :: 13:10pm} {3 :: 15:50pm} {4 :: 16:00pm} }

The expected output should look like this: 预期的输出应如下所示:

{ {1 :: 12:00pm} {2 :: 12:10pm} {1 :: 13:00pm} {2 :: 13:10pm} {3 :: 14:50pm} {4 :: 15:00pm} {3 :: 15:50pm} {4 :: 16:00pm} }

Is there a way to do it using Kafka Streams DSL? 有没有办法使用Kafka Streams DSL?

A Note : There is a good chance that the original topics are not ordered by event-time, and it's ok. 注意 :很有可能原始主题没有按事件时间排序,这没关系。 I would like the algorithm to always pick the earliest of the two events that are currently at the head of each topic (same as the way the merge two sorted arrays algorithm works) 我希望该算法始终选择当前处于每个主题开头的两个事件中的最早事件(与合并两个排序数组算法的工作方式相同)

Kafka Streams (as of version 2.1.0) implements the exact algorithm you describe. Kafka Streams(从2.1.0版开始)实现您描述的确切算法。 Hence, a simple: 因此,一个简单的:

StreamsBuilder builder = new StreamsBuilder();
builder
    .stream(Arrays.asList("firstInputTopic", "secondInputTopic"))
    .to("outputTopidName");

should do what you want. 应该做你想做的。 Note that the program will merge data on a per-partition bases. 请注意,该程序将基于分区合并数据。

Also consider configuration max.task.idle.ms . 还要考虑配置max.task.idle.ms

For more details read the corresponding KIP: https://cwiki.apache.org/confluence/display/KAFKA/KIP-353%3A+Improve+Kafka+Streams+Timestamp+Synchronization 有关更多详细信息,请阅读相应的KIP: https : //cwiki.apache.org/confluence/display/KAFKA/KIP-353%3A+Improve+Kafka+Streams+Timestamp+Synchronization

Additionally, you need to implement and configure a custom TimestampExtractor that gets the timestamp from the value. 另外,您需要实现和配置一个自定义的TimestampExtractor ,该值将从值中获取时间戳。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM