How to fix “Expressions referencing the outer query…” error in Spark-SQL?

Question

I have an SQL query with a subquery running on Spark. I get this error: "Expressions referencing the outer query are not supported outside of WHERE/HAVING clauses ". Can you help me to find out the reason?

  select distinct NAME from table1, table2 t
  where t.ID = (select min(t.ID) from table1 a where a.WID = table1.WID) and 
 t.WID = table1.WID  and 
 t.VID = table1.VID

the error message is as follows:

"org.apache.spark.sql.AnalysisException: Expressions referencing the outer query are not supported outside of WHERE/HAVING clauses: Aggregate [min(outer(FAILURE_ID#3104)) AS min(outer())#3404]"

Answer 1

Learn to use proper, explicit, standard JOIN syntax!

You can write your query with all table references in the FROM clause:

select distinct NAME
from table1 t1 join
     table2 t2
     on t2.WID = t1.WID  and 
        t2.VID = t1.VID join
     (select tt1.WID, min(tt1.id) as min_id
      from table1 tt1
      group by tt1.WID
     ) tt1
     on tt1.WID = t1.WID and tt1.min_id = t1.id;

Or use window functions:

select distinct NAME
from table2 t2 join
     (select t1.*,
             min(t1.id) over (partition by t1.WID) as min_id
      from table1 t1
     ) t1 
     on t2.WID = t1.WID  and 
        t2.VID = t1.VID and
        t1.min_id = t1.id;

EDIT:

The above assumes a reasonable interpretation of your query. To mimic the logic as written, you can do:

select distinct NAME
from table1 t1 join
     table2 t2
     on t2.WID = t1.WID  and 
        t2.VID = t1.VID 
where t1.ID is not null;

That is all the subquery is doing.

How to fix “Expressions referencing the outer query…” error in Spark-SQL?

Question

1 answers

solution1
0 2019-07-22 10:32:13

How to fix “Expressions referencing the outer query…” error in Spark-SQL?

Question

1 answers

solution1 0 2019-07-22 10:32:13

solution1
0 2019-07-22 10:32:13