I would like a simpler better and more elegant way of approaching the below problem. I have yet to come across any documentation on the topic, and i am sure there my current approach has some bottle necks, thank you
I have a stream where Json is mapped to a POJO
DataStream<MYPOJO> stream = env.
addSource( <<kafkaSource>>).map(new EventToPOJO());
Some of the fields of the POJO will have a populated primary key and some will have a populated alternate-Key , Some will have both.The only example of working with two keys I have found in Flink document, is using a keyselector for a composite key but nothing for alternate keys
My current approach is as follows:
DataStream<MyPOJO> primaryKey = stream.flatMap(new RichFlatMapFunction<MyPOJO mypojo, MyPOJO mypojo>() {
@Override
public void flatMap(MyPOJO mypojo, Collector<MyPOJO> collector) throws Exception {
if(mypojo.PrimaryKey() != null){
collector.collect(MyPOJO);
}
}
});
DataStream<MyPOJO> alternateKey = stream.flatMap(new RichFlatMapFunction<MyPOJO mypojo, MyPOJO mypojo>() {
@Override
public void flatMap(MyPOJO mypojo, Collector<MyPOJO> collector) throws Exception {
if(mypojo.getAlternateKey() != null){
collector.collect(mypojo);
}
}
});
DataStream<MyPOJO> both = stream.flatMap(new RichFlatMapFunction<MyPOJO mypojo, MyPOJO mypojo>() {
@Override
public void flatMap(MyPOJO mypojo, Collector<MYPOJO> collector) throws Exception {
if(mypojo.getAlternateKey() != null && mypojo.getPrimaryKey() !=null ){
collector.collect(mypojo);
}
}
});
//Join them
both.join(alternateKey)
.where(MyPOJO::getAlternateKey)
.equalTo(MyPOJO::getAlternateKey)
.window(TumblingEventTimeWindows.of(Time.milliseconds(1)))
.apply (new JoinFunction<MyPOJO, MyPOJO, MyPOJO>(){
@Override
public StateObject join(MyPOJO Mypojo, MyPOJO mypojo2) throws Exception {
// Some Join logic to keep both states
return stateObject2;
}
});
:: repeat for primary key stream ...
// keyby at the end
both.keyBy(MyPOJO::getPrimaryKey)
I'm sure I could use a filter function As well to achieve the 3 streams, but I would like not to have to split into 3 streams in the first place, please not I have simplified the above for readability sake so please dont mind any syntax errors you may find.
You should implement a custom POJO that contains the primary & secondary keys. It needs to have equals()
and hashCode()
methods, which implement your required logic(*) of when two records are equal. See hashCode() and equals() method for custom classes in flink for more details on why you have to do this.
Add a MyPOJO.getJoiningKey()
that returns this custom POJO.
Then just do a single join based on .where(r -> r.getJoiningKey()).equals(r -> r.getJoiningKey())
.
(*) I'm still not sure of what you want your logic to be. Eg if left-side primary & secondary key is not-null, and right-side primary key is null but secondary key is not-null, what would you want to compare?
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.