[英]BigQuery reproducing a query that use another table in a subselect
I am blocked on reproducing in BigQuery a query that is similar to the following one on MSSQL:我无法在 BigQuery 中重现类似于以下 MSSQL 查询的查询:
SELECT
COL1,
COL2, COL3,
CASE
WHEN ( COL1 % 2 ) = 0 THEN COL2
ELSE (SELECT TOP 1 COL99 FROM ANOTHER_TABLE AS AT WHERE AT.COL8 = T.COL2 AND AT.COL9 < T.COL3 ORDER BY AT.COL9 DESC)
END AS COL4
FROM TABLE AS T
First, I tried to reproduce the query on BQ like the following:首先,我尝试重现 BQ 上的查询,如下所示:
SELECT
COL1,
COL2, COL3,
CASE
WHEN ( COL1 % 2 ) = 0 THEN COL2
ELSE (SELECT COL99 FROM PROJECT.DATASET.ANOTHER_TABLE AS AT WHERE AT.COL8 = T.COL2 AND AT.COL9 < T.COL3 ORDER BY AT.COL9 DESC LIMIT 1)
END AS COL4
FROM PROJECT.DATASET.TABLE AS T
But it leads to the error: Correlated subqueries that reference other tables are not supported unless they can be de-correlated, such as by transforming them into an efficient JOIN.
但它会导致错误:
Correlated subqueries that reference other tables are not supported unless they can be de-correlated, such as by transforming them into an efficient JOIN.
I can understand this error, I agree that the original query is not very optimized since a subselect can be executed for every rows in the table.我可以理解这个错误,我同意原始查询不是很优化,因为可以对表中的每一行执行子查询。
Knowing that I tried the following which doesn't lead to an error but give wrong (too much) results:知道我尝试了以下不会导致错误但给出错误(太多)结果的方法:
SELECT
COL1,
COL2, COL3,
CASE
WHEN ( COL1 % 2 ) = 0 THEN COL2
ELSE AT.COL99
END AS COL4
FROM PROJECT.DATASET.TABLE AS T
LEFT JOIN (
SELECT * FROM (
SELECT
COL99,
COL8,
COL9
ROW_NUMBER() OVER (PARITION BY COL8 ORDER BY COL9 DESC) AS rn
) AS TMP
/*WHERE TMP.rn = 1*/
) AS AT
ON AT.COL8 = T.COL2
AND AT.COL9 < T.COL3
This query returns more rows than expected which is normal knowing the condition "AND AT.COL9 < T.COL3", but I have difficulties to find out how to take the minimum ROW_NUMBER value (rn) to reproduce the TOP 1 of the original query.此查询返回的行比预期的多,知道条件“AND AT.COL9 < T.COL3”是正常的,但我很难找出如何采用最小 ROW_NUMBER 值 (rn) 来重现原始查询的 TOP 1 .
I tried to put TMP.rn = 1
in the AT table, but the problem is that it is not always the first value that respect the condition AND AT.COL9 < T.COL3
.我试图将
TMP.rn = 1
放入 AT 表中,但问题是它并不总是第一个符合条件AND AT.COL9 < T.COL3
的值。
To resume, my goal is to be able to reproduce the first query at the top of this question on BigQuery, I've tried something but I am blocking on the how to take the minimum value of ROW_NUMBER (rn) matching the condition AND AT.COL9 < T.COL3
.要恢复,我的目标是能够在 BigQuery 上重现此问题顶部的第一个查询,我已经尝试了一些但我正在阻止如何获取与条件
AND AT.COL9 < T.COL3
匹配的 ROW_NUMBER (rn) 的最小值AND AT.COL9 < T.COL3
.
Did anyone had a similar use case by any chance?有没有人有过类似的用例?
Edit: Adding input and output:编辑:添加输入和 output:
COL1![]() |
COL2![]() |
COL3![]() |
---|---|---|
1234 ![]() |
AAA ![]() |
25/12/2022 ![]() |
1235 ![]() |
BBB ![]() |
25/12/2022 ![]() |
1236 ![]() |
CCC ![]() |
25/12/2022 ![]() |
1337 ![]() |
AAA ![]() |
24/12/2022 ![]() |
1238 ![]() |
AAA ![]() |
23/12/2022 ![]() |
1239 ![]() |
AAA ![]() |
22/12/2022 ![]() |
COL99 ![]() |
COL8 ![]() |
COL9 ![]() |
---|---|---|
1111 ![]() |
AAA ![]() |
25/12/2022 ![]() |
2222 ![]() |
BBB ![]() |
25/12/2022 ![]() |
3333 ![]() |
CCC ![]() |
25/12/2022 ![]() |
9999 ![]() |
AAA ![]() |
23/12/2022 ![]() |
8888 ![]() |
AAA ![]() |
22/12/2022 ![]() |
7777 ![]() |
AAA ![]() |
21/12/2022 ![]() |
COL1![]() |
COL2![]() |
COL3![]() |
COL4![]() |
---|---|---|---|
1234 ![]() |
AAA ![]() |
25/12/2022 ![]() |
AAA ![]() |
1235 ![]() |
BBB ![]() |
25/12/2022 ![]() |
NULL ![]() |
1236 ![]() |
CCC ![]() |
25/12/2022 ![]() |
CCC ![]() |
1237 ![]() |
AAA ![]() |
24/12/2022 ![]() |
9999 ![]() |
1238 ![]() |
AAA ![]() |
23/12/2022 ![]() |
AAA ![]() |
1239 ![]() |
AAA ![]() |
22/12/2022 ![]() |
7777 ![]() |
You can use FIRST_VALUE()
window function:您可以使用
FIRST_VALUE()
window function:
SELECT DISTINCT T.COL1, T.COL2, T.COL3,
CASE
WHEN T.COL1 % 2 = 0 THEN T.COL2
ELSE FIRST_VALUE(AT.COL99) OVER (PARTITION BY T.COL1, T.COL2, T.COL3 ORDER BY AT.COL9 DESC)
END AS COL4
FROM FIRST_TABLE AS T LEFT JOIN ANOTHER_TABLE AS AT
ON AT.COL8 = T.COL2 AND AT.COL9 < T.COL3 AND T.COL1 % 2 <> 0;
If COL1
is unique in the first table, you can simplify the PARTITION BY
clause to:如果
COL1
在第一个表中是唯一的,您可以将PARTITION BY
子句简化为:
OVER (PARTITION BY T.COL1 ORDER BY AT.COL9 DESC)
See the demo (for MySql but it is standard SQL).请参阅演示(对于 MySql,但它是标准 SQL)。
The query provided by @forpas returns good results in my example but does not return the result I am waiting for in my real use case. @forpas 提供的查询在我的示例中返回了良好的结果,但没有返回我在实际用例中等待的结果。
But @forpas's idea inspired me and I found a way to resolve my problem.但是@forpas 的想法启发了我,我找到了解决问题的方法。
It gives the same result in the link provided by @forpas and the query looks like this in MySQL:它在@forpas 提供的链接中给出了相同的结果,查询在 MySQL 中如下所示:
SELECT T.COL1, T.COL2, T.COL3,
CASE
WHEN T.COL1 % 2 = 0 THEN T.COL2
ELSE AT1.COL99
END AS COL4
FROM FIRST_TABLE AS T
LEFT JOIN (
SELECT * FROM (
SELECT
AT.COL99,
T.COL2,
T.COL3,
ROW_NUMBER() OVER (PARTITION BY T.COL3, T.COL2, AT.COL8 ORDER BY AT.COL9 DESC) AS COUNTER
FROM ANOTHER_TABLE AS AT
INNER JOIN FIRST_TABLE AS T
ON AT.COL8 = T.COL2 AND AT.COL9 < T.COL3) TEMP
WHERE TEMP.COUNTER = 1
) AS AT1
ON AT1.COL2 = T.COL2 AND AT1.COL3 = T.COL3 ;
The query might be complex for nothing and if someone has something more optimized I would be happy to try it.查询可能很复杂,如果有人有更优化的东西,我会很乐意尝试。
Thank you @forpas for the proposal !谢谢@forpas 的提议!
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.