简体   繁体   English

Flink IntervalJoin 调用时间戳

[英]Flink IntervalJoin invoke timestamp

From Flink docs: https://nightlies.apache.org/flink/flink-docs-release-1.16/docs/dev/datastream/operators/joining/#interval-join来自 Flink 文档: https ://nightlies.apache.org/flink/flink-docs-release-1.16/docs/dev/datastream/operators/joining/#interval-join

IntervalJoin will join two streams sharing the same Key and within the lower/upper timestamp boundary, say I have the following streams IntervalJoin 将加入共享相同 Key 并在下/上时间戳边界内的两个流,假设我有以下流

packet: [(id=1, ts=20:00:00,...),(id=1,ts=20:00:01,...),(id=3,ts=20:00:02,...),....]
metadata: [(id=1, ts=20:00:03,...),(id=2,ts=20:00:04,...),....]
packetStream
    .keyBy(elem => elem.id)
    .intervalJoin(metadata.keyBy(elem => elem.id))
    .between(Time.seconds(-2), Time.seconds(1))

When will the ProcessFunction be invoked?什么时候调用 ProcessFunction? I understand that it will be invoked twice for id=1 but when?我知道它会为 id=1 调用两次但是什么时候? at 20:00:00 and 20:00:01 or only once the window is closed 20:00:03?在 20:00:00 和 20:00:01 或仅在 20:00:03 窗口关闭后?

The processElement method of the ProcessJoinFunction is called as each pair of elements becomes available for processing.当每对元素变得可用于处理时,将调用ProcessJoinFunctionprocessElement方法。

In the example you've given, it will never be called for id=1 because there's no metadata event within the specified interval of the events at 20:00:00 and 20:00:01.在您给出的示例中,永远不会为 id=1 调用它,因为在 20:00:00 和 20:00:01 的指定事件间隔内没有元数据事件。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM