简体   繁体   中英

Apache Flink how do I go about mapping & matching an alternate-key with a primary key to one keyed Stream

I would like a simpler better and more elegant way of approaching the below problem. I have yet to come across any documentation on the topic, and i am sure there my current approach has some bottle necks, thank you

I have a stream where Json is mapped to a POJO

DataStream<MYPOJO> stream = env.
             addSource( <<kafkaSource>>).map(new EventToPOJO());

Some of the fields of the POJO will have a populated primary key and some will have a populated alternate-Key , Some will have both.The only example of working with two keys I have found in Flink document, is using a keyselector for a composite key but nothing for alternate keys

My current approach is as follows:

  1. Use a richFlatMapFunction to collect all elements of primary key into stream, Astream
  2. Use a richFlatMapFunction to collect all elements of alternate Key into a stream, BStream
  3. USe richFlatMap for items that have both primary and alternate keys, CStream
  4. Join the Astream with the Cstream on Primary Key
  5. Join the Bstream with the Cstream on Alternate Key
  6. finally KeyBy Primary Key

 DataStream<MyPOJO> primaryKey = stream.flatMap(new RichFlatMapFunction<MyPOJO mypojo, MyPOJO mypojo>() {
            @Override
            public void flatMap(MyPOJO mypojo, Collector<MyPOJO> collector) throws Exception {
                if(mypojo.PrimaryKey() != null){
                 
                    collector.collect(MyPOJO);
                }
            }
        });


 DataStream<MyPOJO> alternateKey = stream.flatMap(new RichFlatMapFunction<MyPOJO mypojo, MyPOJO mypojo>() {
            @Override
            public void flatMap(MyPOJO mypojo, Collector<MyPOJO> collector) throws Exception {
                if(mypojo.getAlternateKey() != null){
                 
                    collector.collect(mypojo);
                }
            }
        });


 DataStream<MyPOJO> both = stream.flatMap(new RichFlatMapFunction<MyPOJO mypojo, MyPOJO mypojo>() {
            @Override
            public void flatMap(MyPOJO mypojo, Collector<MYPOJO> collector) throws Exception {
                if(mypojo.getAlternateKey() != null && mypojo.getPrimaryKey() !=null ){
                 
                    collector.collect(mypojo);
                }
            }
        });



//Join them 

   both.join(alternateKey)
                .where(MyPOJO::getAlternateKey)
                .equalTo(MyPOJO::getAlternateKey)
                .window(TumblingEventTimeWindows.of(Time.milliseconds(1)))
                .apply (new JoinFunction<MyPOJO, MyPOJO, MyPOJO>(){
                   
                    @Override
                    public StateObject join(MyPOJO Mypojo, MyPOJO mypojo2) throws Exception {

                      // Some Join logic to keep both states 
                        return stateObject2;
                    }
                });

:: repeat for primary key stream ...


// keyby at the end
both.keyBy(MyPOJO::getPrimaryKey)


I'm sure I could use a filter function As well to achieve the 3 streams, but I would like not to have to split into 3 streams in the first place, please not I have simplified the above for readability sake so please dont mind any syntax errors you may find.

You should implement a custom POJO that contains the primary & secondary keys. It needs to have equals() and hashCode() methods, which implement your required logic(*) of when two records are equal. See hashCode() and equals() method for custom classes in flink for more details on why you have to do this.

Add a MyPOJO.getJoiningKey() that returns this custom POJO.

Then just do a single join based on .where(r -> r.getJoiningKey()).equals(r -> r.getJoiningKey()) .

(*) I'm still not sure of what you want your logic to be. Eg if left-side primary & secondary key is not-null, and right-side primary key is null but secondary key is not-null, what would you want to compare?

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM