[英]Refactor SQL Query , using CTE vs SubQuery
我正在從 S3 Bucket 創建一個數據集,目前我正在嘗試提高查詢的性能,因為我目前有兩種方法可以使用,但我希望看到更好的查詢並學習如何提高我的 sql 技能。 抱歉,沒有可使用的示例數據集,因為在從 S3 中的 .json 文件中提取時,我還沒有找到提供模擬數據的實用方法。
查詢#1
WITH block_1 AS
(
SELECT
VALUE:COL1 AS COL1,
VALUE:COL2 AS COL2,
VALUE:COL3 AS COL3,
VALUE:COL4 AS COL4
from '@S3_BUCKET/',
lateral flatten( input => $1:value)), block_2 as
(
SELECT
VALUE:COL1 AS COL1,
max(VALUE:COL4) AS MaxCOL4
from '@S3_BUCKET/',
lateral flatten( input => $1:value)
group by COL1
)
select b.COL1 as COL1B, b.COLB as COL1B,
a.COL3, a.COL4 from block_1 as A
join block_2 b
on a.COL1 = b.COL1 and a.COL4 = b.MaxCOL4
;
QUERY #2 ,我覺得這是一個改進,特別是因為你不需要在最終的SELECT
語句中指定你想要的列(就像我上面所做的那樣)
select a.* from
(
SELECT
VALUE:COL1 AS COL1,
VALUE:COL2 AS COL2,
VALUE:COL3 AS COL3,
VALUE:COL4 AS COL4
from '@S3_BUCKET/',
lateral flatten( input => $1:value))a
join
(
select COL1, MAX(COL4) COL4
from
(
SELECT
VALUE:COL1 AS COL1,
VALUE:COL2 AS COL2,
VALUE:COL3 AS COL3,
VALUE:COL4 AS COL4
from '@S3_BUCKET/',
lateral flatten( input => $1:value))
group by COL1) b
on a.COL1 = b.COL1 and a.COL4 = b.Col4;
以上兩個是我目前的嘗試,想知道是否有辦法使這個查詢更好? 我想的另一條路線可能是使用 "where in" 和 COL1 的列表,但基本上我仍然必須按 s3 2x ,如上面的查詢。
您應該能夠使用window functions
,特別是RANK()
來簡化此查詢:
WITH block_1 AS (
SELECT
VALUE:COL1 AS COL1,
VALUE:COL2 AS COL2,
VALUE:COL3 AS COL3,
VALUE:COL4 AS COL4,
RANK() OVER (PARTITION BY VALUE:COL1 ORDER BY VALUE:COL4 DESC) AS rk
FROM '@S3_BUCKET/',
lateral flatten( input => $1:value)
)
SELECT COL1, COL2, COL3, COL4
FROM block_1
WHERE rk = 1
由於 Snowflake 的QUALIFY
子句,這可以簡化,它允許您在有效的HAVING
子句中使用窗口函數的別名:
SELECT
VALUE:COL1 AS COL1,
VALUE:COL2 AS COL2,
VALUE:COL3 AS COL3,
VALUE:COL4 AS COL4,
RANK() OVER (PARTITION BY VALUE:COL1 ORDER BY VALUE:COL4 DESC) AS rk
FROM '@S3_BUCKET/',
lateral flatten( input => $1:value)
QUALIFY rk = 1
@缺口。 使用qualify,這將作為where filter 和set = 1。同時用row_number 替換rank。 那有意義嗎 ?
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.