简体   繁体   中英

Joining more than 2 streams using the same sliding window in Flink

I have 3 streams A, B and C that I am supposed to join into a single stream lets call it ABC and do some operation on.

It is important that I use sliding windows with size X and slide Y where Y <= X*3

All the streams contain some common ID that I use for the join and X, Y are time parameters defined in seconds.

My current implementation is to join stream A and B into AB using a tumbling window of with size X and then join AB with C using a sliding window with size X and slide Y.

This may lead to incorrect answers in cases such as: Stream A receives a message at time 0, and Stream B receives a message at time Y+1. In this case both messages should go inside the same sliding window because Y+1 < X, but the end result is that when I join AB and C the message from B is missing due to the initial tumbling window.

Can I do a multi-stream join in Flink using a single sliding window similar to how I would do join multiple dataframes in Spark?

I think what will work in this case is to use two sliding window joins -- one to compute AB, and another to join those results with C. The one issue you may have is with the timestamps on the records produced by the first join -- I'm not sure what timestamps Flink will put into the StreamRecords that wrap your AB events, but for normal (non-join) windows, Flink sets the timestamps on the result records to the window end time. This may not be what you want in this case. If this is an issue, you can put an additional timestamp assigner after the first sliding window to set the timestamps appropriately, before the second join (with C).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM