I have an SQL query with a subquery running on Spark. I get this error: "Expressions referencing the outer query are not supported outside of WHERE/HAVING clauses
". Can you help me to find out the reason?
select distinct NAME from table1, table2 t
where t.ID = (select min(t.ID) from table1 a where a.WID = table1.WID) and
t.WID = table1.WID and
t.VID = table1.VID
the error message is as follows:
"org.apache.spark.sql.AnalysisException: Expressions referencing the outer query are not supported outside of WHERE/HAVING clauses: Aggregate [min(outer(FAILURE_ID#3104)) AS min(outer())#3404]"
Learn to use proper, explicit, standard JOIN
syntax!
You can write your query with all table references in the FROM
clause:
select distinct NAME
from table1 t1 join
table2 t2
on t2.WID = t1.WID and
t2.VID = t1.VID join
(select tt1.WID, min(tt1.id) as min_id
from table1 tt1
group by tt1.WID
) tt1
on tt1.WID = t1.WID and tt1.min_id = t1.id;
Or use window functions:
select distinct NAME
from table2 t2 join
(select t1.*,
min(t1.id) over (partition by t1.WID) as min_id
from table1 t1
) t1
on t2.WID = t1.WID and
t2.VID = t1.VID and
t1.min_id = t1.id;
EDIT:
The above assumes a reasonable interpretation of your query. To mimic the logic as written, you can do:
select distinct NAME
from table1 t1 join
table2 t2
on t2.WID = t1.WID and
t2.VID = t1.VID
where t1.ID is not null;
That is all the subquery is doing.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.