简体   繁体   English

在 SQL 中使用带等级和子查询的 where 语句

[英]Using a where statement with rank and a subquery in SQL

so I have a table that's sort of like this:所以我有一张类似这样的表:

DELIVERY_AREA_ID DELIVERY_AREA_ID DELIVERY_RADIUS_METERS DELIVERY_RADIUS_METERS 交付 EVENT_STARTED_TIMESTAMP EVENT_STARTED_TIMESTAMP
234sfd 234sfd 4000 4000 2020-01-01 12:19:29.719 2020-01-01 12:19:29.719
234sfd 234sfd 6500 6500 2020-01-01 12:31:40.325 2020-01-01 12:31:40.325
234sfd 234sfd 3500 3500 2020-01-01 12:53:10.538 2020-01-01 12:53:10.538
234sfd 234sfd 6500 6500 2020-01-01 13:11:36.094 2020-01-01 13:11:36.094
234sfd 234sfd 3500 3500 2020-01-01 13:32:26.754 2020-01-01 13:32:26.754
234sfd 234sfd 6500 6500 2020-01-01 13:59:11.104 2020-01-01 13:59:11.104
234sfd 234sfd 6500 6500 2020-01-02 07:44:16.792 2020-01-02 07:44:16.792
234sfd 234sfd 3500 3500 2020-01-02 08:07:36.284 2020-01-02 08:07:36.284
234sfd 234sfd 6500 6500 2020-01-02 08:54:08.014 2020-01-02 08:54:08.014
234sfd 234sfd 3500 3500 2020-01-02 09:53:05.853 2020-01-02 09:53:05.853
234sfd 234sfd 6500 6500 2020-01-02 10:04:39.443 2020-01-02 10:04:39.443
234sfd 234sfd 10000 10000 2020-07-01 08:29:20.194 2020-07-01 08:29:20.194
234sfd 234sfd 3500 3500 2020-07-03 07:50:41.782 2020-07-03 07:50:41.782
234sfd 234sfd 10000 10000 2020-07-03 08:33:14.695 2020-07-03 08:33:14.695
234sfd 234sfd 3500 3500 2020-07-05 07:47:05.539 2020-07-05 07:47:05.539
234sfd 234sfd 10000 10000 2020-07-05 07:53:13.930 2020-07-05 07:53:13.930
234sfd 234sfd 3500 3500 2020-07-05 09:18:57.688 2020-07-05 09:18:57.688
234sfd 234sfd 10000 10000 2020-07-05 09:51:07.547 2020-07-05 09:51:07.547
234sfd 234sfd 3500 3500 2020-07-19 18:02:14.099 2020-07-19 18:02:14.099

the data is actually much more varied but yeah it follows that format.数据实际上更加多样化,但是是的,它遵循这种格式。

I am trying to, in one query, in snowflake database, make a get the top ranked radius by duration.我试图在一个查询中,在雪花数据库中,按持续时间获得排名最高的半径。 I currently have this:我目前有这个:

SELECT DELIVERY_AREA_ID,
       MAX(DELIVERY_RADIUS_METERS) AS default_delivery_radius,
       MONTH_YEAR,
       DELIVERY_RADIUS_METERS,
       SUM(DURATION_SECONDS) AS total_duration,
       MAX(EVENT_STARTED_TIMESTAMP) AS MAX_TIMESTAMP,
       RANK() OVER (PARTITION BY DELIVERY_AREA_ID, MONTH_YEAR
                    ORDER BY SUM(DURATION_SECONDS) DESC) AS RADIUS_RANK
FROM (
    -- Add the MONTH_YEAR column to the delivery_radius_log table
    SELECT DELIVERY_AREA_ID,
           DELIVERY_RADIUS_METERS,
           EVENT_STARTED_TIMESTAMP,
           CONCAT(MONTH(EVENT_STARTED_TIMESTAMP), '/',
                  YEAR(EVENT_STARTED_TIMESTAMP)) AS MONTH_YEAR,
           DATEADD(second, DATEDIFF(second, EVENT_STARTED_TIMESTAMP, LEAD(EVENT_STARTED_TIMESTAMP) OVER (PARTITION BY DELIVERY_AREA_ID ORDER BY EVENT_STARTED_TIMESTAMP)), EVENT_STARTED_TIMESTAMP) AS end_timestamp,
           DATEDIFF(second, EVENT_STARTED_TIMESTAMP, LEAD(EVENT_STARTED_TIMESTAMP) OVER (PARTITION BY DELIVERY_AREA_ID ORDER BY EVENT_STARTED_TIMESTAMP)) AS duration_seconds
    FROM delivery_radius_log
) t  -- added alias here
GROUP BY DELIVERY_AREA_ID, MONTH_YEAR, DELIVERY_RADIUS_METERS

I want to get the first rank for each month_year but when I use我想获得每个 month_year 的第一名但是当我使用

where RADIUS_RANK = 1

I get an error: Syntax error: unexpected 'where'.我收到一个错误:语法错误:意外的“哪里”。 (line 21) (第 21 行)

Im not sure how to resolve this我不确定如何解决这个问题

I have tried this link which appears to have the same question but the solution is already what I am trying.我试过这个链接似乎有同样的问题,但解决方案已经是我正在尝试的。

It is not possible to solve this scenario without querying the output of your query, in other words, using the output of that query as an input for another top-level query.不查询查询的输出就不可能解决这种情况,换句话说,使用该查询的输出作为另一个顶级查询的输入。

  • You can not use a field produced at the projection level in the WHERE clause您不能在 WHERE 子句中使用在投影级别生成的字段
  • You can not use analytic functions in the WHERE clause不能在 WHERE 子句中使用解析函数
  • You can not use analytic functions in a HAVING clause不能在 HAVING 子句中使用分析函数

So the only solution is to query the output of that query and retrieve only the MIN rank.因此唯一的解决方案是查询该查询的输出并仅检索 MIN 排名。

To filter windowed function at the same query level you need to use QUALIFY clause:要在同一查询级别过滤窗口函数,您需要使用QUALIFY子句:

SELECT DELIVERY_AREA_ID,
       MAX(DELIVERY_RADIUS_METERS) AS default_delivery_radius,
       MONTH_YEAR,
       DELIVERY_RADIUS_METERS,
       SUM(DURATION_SECONDS) AS total_duration,
       MAX(EVENT_STARTED_TIMESTAMP) AS MAX_TIMESTAMP,
       RANK() OVER (PARTITION BY DELIVERY_AREA_ID, MONTH_YEAR
                    ORDER BY SUM(DURATION_SECONDS) DESC) AS RADIUS_RANK
FROM (
    -- Add the MONTH_YEAR column to the delivery_radius_log table
    SELECT DELIVERY_AREA_ID,
           DELIVERY_RADIUS_METERS,
           EVENT_STARTED_TIMESTAMP,
           CONCAT(MONTH(EVENT_STARTED_TIMESTAMP), '/',
                  YEAR(EVENT_STARTED_TIMESTAMP)) AS MONTH_YEAR,
           DATEADD(second, DATEDIFF(second, EVENT_STARTED_TIMESTAMP, LEAD(EVENT_STARTED_TIMESTAMP) OVER (PARTITION BY DELIVERY_AREA_ID ORDER BY EVENT_STARTED_TIMESTAMP)), EVENT_STARTED_TIMESTAMP) AS end_timestamp,
           DATEDIFF(second, EVENT_STARTED_TIMESTAMP, LEAD(EVENT_STARTED_TIMESTAMP) OVER (PARTITION BY DELIVERY_AREA_ID ORDER BY EVENT_STARTED_TIMESTAMP)) AS duration_seconds
    FROM delivery_radius_log
) t  -- added alias here
GROUP BY DELIVERY_AREA_ID, MONTH_YEAR, DELIVERY_RADIUS_METERS
QUALIFY RADIUS_RANK = 1;

If the rank column is not required then the entire expression could be moved:如果不需要排名列,则可以移动整个表达式:

SELECT DELIVERY_AREA_ID,
       MAX(DELIVERY_RADIUS_METERS) AS default_delivery_radius,
       MONTH_YEAR,
       DELIVERY_RADIUS_METERS,
       SUM(DURATION_SECONDS) AS total_duration,
       MAX(EVENT_STARTED_TIMESTAMP) AS MAX_TIMESTAMP
FROM (
    -- Add the MONTH_YEAR column to the delivery_radius_log table
    SELECT DELIVERY_AREA_ID,
           DELIVERY_RADIUS_METERS,
           EVENT_STARTED_TIMESTAMP,
           CONCAT(MONTH(EVENT_STARTED_TIMESTAMP), '/',
                  YEAR(EVENT_STARTED_TIMESTAMP)) AS MONTH_YEAR,
           DATEADD(second, DATEDIFF(second, EVENT_STARTED_TIMESTAMP, LEAD(EVENT_STARTED_TIMESTAMP) OVER (PARTITION BY DELIVERY_AREA_ID ORDER BY EVENT_STARTED_TIMESTAMP)), EVENT_STARTED_TIMESTAMP) AS end_timestamp,
           DATEDIFF(second, EVENT_STARTED_TIMESTAMP, LEAD(EVENT_STARTED_TIMESTAMP) OVER (PARTITION BY DELIVERY_AREA_ID ORDER BY EVENT_STARTED_TIMESTAMP)) AS duration_seconds
    FROM delivery_radius_log
) t  -- added alias here
GROUP BY DELIVERY_AREA_ID, MONTH_YEAR, DELIVERY_RADIUS_METERS
QUALIFY RANK() OVER (PARTITION BY DELIVERY_AREA_ID, MONTH_YEAR
                    ORDER BY SUM(DURATION_SECONDS) DESC) = 1;

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM