[英]Flink IntervalJoin invoke timestamp
From Flink docs: https://nightlies.apache.org/flink/flink-docs-release-1.16/docs/dev/datastream/operators/joining/#interval-join来自 Flink 文档: https ://nightlies.apache.org/flink/flink-docs-release-1.16/docs/dev/datastream/operators/joining/#interval-join
IntervalJoin will join two streams sharing the same Key and within the lower/upper timestamp boundary, say I have the following streams IntervalJoin 将加入共享相同 Key 并在下/上时间戳边界内的两个流,假设我有以下流
packet: [(id=1, ts=20:00:00,...),(id=1,ts=20:00:01,...),(id=3,ts=20:00:02,...),....]
metadata: [(id=1, ts=20:00:03,...),(id=2,ts=20:00:04,...),....]
packetStream
.keyBy(elem => elem.id)
.intervalJoin(metadata.keyBy(elem => elem.id))
.between(Time.seconds(-2), Time.seconds(1))
When will the ProcessFunction be invoked?什么时候调用 ProcessFunction? I understand that it will be invoked twice for id=1 but when?
我知道它会为 id=1 调用两次但是什么时候? at 20:00:00 and 20:00:01 or only once the window is closed 20:00:03?
在 20:00:00 和 20:00:01 或仅在 20:00:03 窗口关闭后?
The processElement
method of the ProcessJoinFunction
is called as each pair of elements becomes available for processing.当每对元素变得可用于处理时,将调用
ProcessJoinFunction
的processElement
方法。
In the example you've given, it will never be called for id=1 because there's no metadata event within the specified interval of the events at 20:00:00 and 20:00:01.在您给出的示例中,永远不会为 id=1 调用它,因为在 20:00:00 和 20:00:01 的指定事件间隔内没有元数据事件。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.