I have a source of events that looks like this
class Event {
String userName;
String webPage;
}
I need to enrich my stream of events with the past web pages access of the user. (I have the information in a DB and I can use it as a Flink source )
class EventStats {
String userName;
Map<String,Integer> webPageCounters;
}
How do I make sure that before I start the processing of Event Stream I will have enrichment data ready for me?
I do not want to do DB calls from inside my stream.
It may be a struggle to do this with Flink tbh. The first idea that comes to mind is to do a db scan and create a separate stream when the job is started. That stream could be used for initialization and You could simply union that with actual EventStats
stream, but this is not currently possible due to this issue. So, basically there are two solutions that can be used.
First one is quite simple, so if You are doing the join manually, You can keep the elements from Event
stream, that do not have matching EventStats
. If You receive EventStats
You simply check if there is any Event
matching that can be emitted. You probably should also have a logic that removes elements from state after some time if those are not matched.
The other solution is a little bit trickier, but also more elegant. So, basically You can implement custom operator that does implement InputSelectable
, in a way that it first tries to consume everything from the EventStats
and only after that it starts reading elements of Event
Stream. There are some caveats with that, You can refer to the documentation for more info. Also, note that InputSelectable
was introduced in Flink 1.9.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.