简体   繁体   中英

How Flink deals with late events when as side-output DataStream is used?

It looks to me that Flink handles late events in 3 ways:

  1. Dropping Late Events when the window expired (default).
  2. Updating the window by including late events with the "allowed lateness" mechanism.
  3. Redirecting late events into another DataStream using the "side output" mechanism.

Let's assume that I have an Event-Time Job that consumes data from Kafka and process a window every 5 minutes. Now, suppose that I redirect late events into another DataStream.

  • Is this new DataStream independent?
  • Is it possible to assign a window to this stream in order to process these late events, let's assume, every hour?
  • If that is possible, is the memory freed after the firing of this window?

Thank you all!

The stream of late events coming from the window operator is an independent stream that contains only events that were so late that the allowed lateness wasn't enough to accommodate them.

You can do whatever you want with this stream, including sending it through another window. But these events will still be late, so you'll need to either re-generate watermarks using a more relaxed strategy that keeps them from still being late, or extend the allowed lateness of this new window.

Windows always clean up after themselves.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM