使用 CTE 与 SubQuery 重构 SQL 查询

Question

I am creating a dataset from an S3 Bucket, and currently I am trying to improve the performance of the query as the current two approaches I have work but I would like to see a better query and learn how to improve my sql skills.我正在从 S3 Bucket 创建一个数据集，目前我正在尝试提高查询的性能，因为我目前有两种方法可以使用，但我希望看到更好的查询并学习如何提高我的 sql 技能。 Sorry for no sample dataset to work with as I have not figured out a practical way to provide mock data when pulling from .json files in S3.抱歉，没有可使用的示例数据集，因为在从 S3 中的 .json 文件中提取时，我还没有找到提供模拟数据的实用方法。

QUERY # 1查询#1

 WITH block_1 AS
    (
    SELECT 
    VALUE:COL1 AS COL1, 
    VALUE:COL2 AS COL2, 
    VALUE:COL3 AS COL3,
    VALUE:COL4 AS COL4
    from '@S3_BUCKET/', 
     lateral flatten( input => $1:value)), block_2 as 

(
SELECT 
VALUE:COL1 AS COL1, 
max(VALUE:COL4) AS MaxCOL4
from '@S3_BUCKET/', 
lateral flatten( input => $1:value)
group by COL1
 )

select b.COL1 as COL1B, b.COLB as COL1B, 
 a.COL3, a.COL4 from block_1 as A
join block_2 b 
on a.COL1 = b.COL1  and a.COL4 = b.MaxCOL4
 ;

QUERY #2 , I felt was an improvement, especially because you do not need to specify the column you want in the final SELECT statement (as I did above) QUERY #2 ，我觉得这是一个改进，特别是因为你不需要在最终的SELECT语句中指定你想要的列（就像我上面所做的那样）

select a.* from 
(
SELECT 
VALUE:COL1 AS COL1, 
VALUE:COL2 AS COL2, 
VALUE:COL3 AS COL3,
VALUE:COL4 AS COL4
from '@S3_BUCKET/', 
lateral flatten( input => $1:value))a 
join 
(
select COL1, MAX(COL4) COL4
from 
(
SELECT 
VALUE:COL1 AS COL1, 
VALUE:COL2 AS COL2, 
VALUE:COL3 AS COL3,
VALUE:COL4 AS COL4
from '@S3_BUCKET/', 
 lateral flatten( input => $1:value))
group by COL1) b
on a.COL1 = b.COL1 and a.COL4 = b.Col4;

The two above are my current attempt, wondering if there would be a way to make this query better?以上两个是我目前的尝试，想知道是否有办法使这个查询更好？ The other route I was thinking was possibly using "where in" , and the list of COL1, but essentially then I still have to hit s3 2x , as the queries above.我想的另一条路线可能是使用 "where in" 和 COL1 的列表，但基本上我仍然必须按 s3 2x ，如上面的查询。

Answer 1

You should be able to use window functions , specifically RANK() to simplify this query:您应该能够使用window functions ，特别是RANK()来简化此查询：

WITH block_1 AS (
    SELECT 
    VALUE:COL1 AS COL1, 
    VALUE:COL2 AS COL2, 
    VALUE:COL3 AS COL3,
    VALUE:COL4 AS COL4,
    RANK() OVER (PARTITION BY VALUE:COL1 ORDER BY VALUE:COL4 DESC) AS rk
    FROM '@S3_BUCKET/', 
     lateral flatten( input => $1:value)
)
SELECT COL1, COL2, COL3, COL4
FROM block_1
WHERE rk = 1

This can be simplified thanks to Snowflake's QUALIFY clause, which allows you to use an alias for a window function in what is effectively a HAVING clause:由于 Snowflake 的QUALIFY子句，这可以简化，它允许您在有效的HAVING子句中使用窗口函数的别名：

SELECT 
    VALUE:COL1 AS COL1, 
    VALUE:COL2 AS COL2, 
    VALUE:COL3 AS COL3,
    VALUE:COL4 AS COL4,
    RANK() OVER (PARTITION BY VALUE:COL1 ORDER BY VALUE:COL4 DESC) AS rk
FROM '@S3_BUCKET/', 
     lateral flatten( input => $1:value)
QUALIFY rk = 1

Answer 2

@nick. @缺口。 Use qualify , this will act as where filter and set = 1. Also replace rank with row_number.使用qualify，这将作为where filter 和set = 1。同时用row_number 替换rank。 Does that make sense ?那有意义吗？

使用 CTE 与 SubQuery 重构 SQL 查询

问题描述

2 个解决方案

解决方案1
1 已采纳 2020-01-03 01:47:59

解决方案2
0 2020-01-03 21:05:07

使用 CTE 与 SubQuery 重构 SQL 查询

问题描述

2 个解决方案

解决方案1 1 已采纳 2020-01-03 01:47:59

解决方案2 0 2020-01-03 21:05:07

解决方案1
1 已采纳 2020-01-03 01:47:59

解决方案2
0 2020-01-03 21:05:07