Flink IntervalJoin 调用时间戳

Question

From Flink docs: https://nightlies.apache.org/flink/flink-docs-release-1.16/docs/dev/datastream/operators/joining/#interval-join来自 Flink 文档： https ://nightlies.apache.org/flink/flink-docs-release-1.16/docs/dev/datastream/operators/joining/#interval-join

IntervalJoin will join two streams sharing the same Key and within the lower/upper timestamp boundary, say I have the following streams IntervalJoin 将加入共享相同 Key 并在下/上时间戳边界内的两个流，假设我有以下流

packet: [(id=1, ts=20:00:00,...),(id=1,ts=20:00:01,...),(id=3,ts=20:00:02,...),....]
metadata: [(id=1, ts=20:00:03,...),(id=2,ts=20:00:04,...),....]

packetStream
    .keyBy(elem => elem.id)
    .intervalJoin(metadata.keyBy(elem => elem.id))
    .between(Time.seconds(-2), Time.seconds(1))

When will the ProcessFunction be invoked?什么时候调用 ProcessFunction？ I understand that it will be invoked twice for id=1 but when?我知道它会为 id=1 调用两次但是什么时候？ at 20:00:00 and 20:00:01 or only once the window is closed 20:00:03?在 20:00:00 和 20:00:01 或仅在 20:00:03 窗口关闭后？

Answer 1

The processElement method of the ProcessJoinFunction is called as each pair of elements becomes available for processing.当每对元素变得可用于处理时，将调用ProcessJoinFunction的processElement方法。

In the example you've given, it will never be called for id=1 because there's no metadata event within the specified interval of the events at 20:00:00 and 20:00:01.在您给出的示例中，永远不会为 id=1 调用它，因为在 20:00:00 和 20:00:01 的指定事件间隔内没有元数据事件。

Flink IntervalJoin 调用时间戳

问题描述

1 个解决方案

解决方案1
1 2022-12-21 11:38:46

Flink IntervalJoin 调用时间戳

问题描述

1 个解决方案

解决方案1 1 2022-12-21 11:38:46

解决方案1
1 2022-12-21 11:38:46