Apache Flink: NullPointerException in DataSet API Outer Join

Question

I'm trying to implement the following simple Query in Flink's Dataset API.

select 
    t1_value1 
from  
    table1 
where  
    t1_suppkey not in ( 
        select  
            t2_suppkey
        from  
            table2
     )

So my idea was to perform a Left Outer Join (table1.leftOuterJoin(table2)...) and then delete all the rows where I get a value for t1_suppkey and t2_suppkey.

So I tried it like this:

     output = table1
    .leftOuterJoin(table2).where("t1_suppkey").equalTo("t2_suppkey")
    .with((Table1 t1, Table2 t2) -> new Tuple2<>(t1.ps_suppkey, t2.s_suppkey))
    .returns(new TypeHint <Tuple2<Integer, Integer>>() {});

However if I do it like this it always fails with "java.lang.NullPointerException" and I'm not sure why. If I use a normal Join instead of a Left Outer Join the code works, but that's not what I want.

Do I need to implement a Left Join differently or is there a more simple way to rewrite the "not in" statement in the Dataset API?

Answer 1

The outer join of the DataSet API calls the JoinFunction also for outer records that don't find a joining record on the inner side. In this case, the JoinFunction.join() method is called with null .

Since you are using a LEFT OUTER JOIN, the second argument Table2 t2 can be null . The NullPointerException is caused by t2.s_suppkey . You need to check for t2 == null and only access t2 if it is not null.

You can also implement the NOT IN join with a FlatJoinFunction that has a Collector argument and only emit t1 if t2 == null .

Another option is to use Flink's batch SQL support which supports the query in your example.

Answer 2

output = table1
.leftOuterJoin(table2)
.where("t1_suppkey").equalTo("t2_suppkey") 
.with((Table1 t1, Table2 t2, Collector<Tuple2<Integer, Integer>> c) -> { 
if(t2 == null) {
    c.collect(new Tuple2<>(t1.t1_suppkey, t1.t1_value1)); 
} 
else { 
    //Do nothing. 
}})

Apache Flink: NullPointerException in DataSet API Outer Join

Question

2 answers

solution1
0 ACCPTED 2017-10-31 08:49:03

solution2
0 2017-11-02 11:58:59

Apache Flink: NullPointerException in DataSet API Outer Join

Question

2 answers

solution1 0 ACCPTED 2017-10-31 08:49:03

solution2 0 2017-11-02 11:58:59

solution1
0 ACCPTED 2017-10-31 08:49:03

solution2
0 2017-11-02 11:58:59