简体   繁体   中英

Join on Multiple Kafka Topics

We have a Flink application that performs window-based joins on 2 Kafka topics by key. The join configurations are as follows:

window-type: TumblingWindow
window-duration: 10s
allowed-lateness: 10s

So, the problem happens when we set the streams to start from earliest offset. It seems as if then the window boundaries are still set based on system-clock and thereby reject the earliest events as they depending on Kafka retention-period could be as old as 14 days .

Is there a suggested way to deal with this or is there a gap in my understanding.

I assume you've configured your environment to use EventTime , and that you are assigning watermarks and timestamps using data contained inside the records you are reading from Kafka. If so, then it should run properly.

Note that if one of the topics has events that are much older than the other, you will get rejected (old) events. If you don't care about adding latency, you can use a BoundedOutOfOrdernessTimestampExtractor to set the timestamps and watermarks, and set the max out of orderness to the maximum time skew between the two topics. If you do that, then I think you'd want to use 0 for allowed lateness.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM