[英]How Flink deals with late events when as side-output DataStream is used?
It looks to me that Flink handles late events in 3 ways:在我看来,Flink 以 3 种方式处理延迟事件:
Let's assume that I have an Event-Time Job that consumes data from Kafka and process a window every 5 minutes.假设我有一个事件时间作业,它使用来自 Kafka 的数据并每 5 分钟处理一次 window。 Now, suppose that I redirect late events into another DataStream.现在,假设我将迟到的事件重定向到另一个 DataStream。
Thank you all!谢谢你们!
The stream of late events coming from the window operator is an independent stream that contains only events that were so late that the allowed lateness wasn't enough to accommodate them.来自 window 运算符的迟到事件的 stream 是一个独立的 stream,它仅包含迟到以至于允许的迟到不足以容纳它们的事件。
You can do whatever you want with this stream, including sending it through another window.你可以用这个 stream 做任何你想做的事情,包括通过另一个 window 发送它。 But these events will still be late, so you'll need to either re-generate watermarks using a more relaxed strategy that keeps them from still being late, or extend the allowed lateness of this new window.但是这些事件仍然会迟到,因此您需要使用更宽松的策略重新生成水印以防止它们仍然迟到,或者延长这个新的 window 的允许迟到。
Windows always clean up after themselves. Windows 总是自己清理之后。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.