简体   繁体   中英

Using a where statement with rank and a subquery in SQL

so I have a table that's sort of like this:

DELIVERY_AREA_ID DELIVERY_RADIUS_METERS EVENT_STARTED_TIMESTAMP
234sfd 4000 2020-01-01 12:19:29.719
234sfd 6500 2020-01-01 12:31:40.325
234sfd 3500 2020-01-01 12:53:10.538
234sfd 6500 2020-01-01 13:11:36.094
234sfd 3500 2020-01-01 13:32:26.754
234sfd 6500 2020-01-01 13:59:11.104
234sfd 6500 2020-01-02 07:44:16.792
234sfd 3500 2020-01-02 08:07:36.284
234sfd 6500 2020-01-02 08:54:08.014
234sfd 3500 2020-01-02 09:53:05.853
234sfd 6500 2020-01-02 10:04:39.443
234sfd 10000 2020-07-01 08:29:20.194
234sfd 3500 2020-07-03 07:50:41.782
234sfd 10000 2020-07-03 08:33:14.695
234sfd 3500 2020-07-05 07:47:05.539
234sfd 10000 2020-07-05 07:53:13.930
234sfd 3500 2020-07-05 09:18:57.688
234sfd 10000 2020-07-05 09:51:07.547
234sfd 3500 2020-07-19 18:02:14.099

the data is actually much more varied but yeah it follows that format.

I am trying to, in one query, in snowflake database, make a get the top ranked radius by duration. I currently have this:

SELECT DELIVERY_AREA_ID,
       MAX(DELIVERY_RADIUS_METERS) AS default_delivery_radius,
       MONTH_YEAR,
       DELIVERY_RADIUS_METERS,
       SUM(DURATION_SECONDS) AS total_duration,
       MAX(EVENT_STARTED_TIMESTAMP) AS MAX_TIMESTAMP,
       RANK() OVER (PARTITION BY DELIVERY_AREA_ID, MONTH_YEAR
                    ORDER BY SUM(DURATION_SECONDS) DESC) AS RADIUS_RANK
FROM (
    -- Add the MONTH_YEAR column to the delivery_radius_log table
    SELECT DELIVERY_AREA_ID,
           DELIVERY_RADIUS_METERS,
           EVENT_STARTED_TIMESTAMP,
           CONCAT(MONTH(EVENT_STARTED_TIMESTAMP), '/',
                  YEAR(EVENT_STARTED_TIMESTAMP)) AS MONTH_YEAR,
           DATEADD(second, DATEDIFF(second, EVENT_STARTED_TIMESTAMP, LEAD(EVENT_STARTED_TIMESTAMP) OVER (PARTITION BY DELIVERY_AREA_ID ORDER BY EVENT_STARTED_TIMESTAMP)), EVENT_STARTED_TIMESTAMP) AS end_timestamp,
           DATEDIFF(second, EVENT_STARTED_TIMESTAMP, LEAD(EVENT_STARTED_TIMESTAMP) OVER (PARTITION BY DELIVERY_AREA_ID ORDER BY EVENT_STARTED_TIMESTAMP)) AS duration_seconds
    FROM delivery_radius_log
) t  -- added alias here
GROUP BY DELIVERY_AREA_ID, MONTH_YEAR, DELIVERY_RADIUS_METERS

I want to get the first rank for each month_year but when I use

where RADIUS_RANK = 1

I get an error: Syntax error: unexpected 'where'. (line 21)

Im not sure how to resolve this

I have tried this link which appears to have the same question but the solution is already what I am trying.

It is not possible to solve this scenario without querying the output of your query, in other words, using the output of that query as an input for another top-level query.

  • You can not use a field produced at the projection level in the WHERE clause
  • You can not use analytic functions in the WHERE clause
  • You can not use analytic functions in a HAVING clause

So the only solution is to query the output of that query and retrieve only the MIN rank.

To filter windowed function at the same query level you need to use QUALIFY clause:

SELECT DELIVERY_AREA_ID,
       MAX(DELIVERY_RADIUS_METERS) AS default_delivery_radius,
       MONTH_YEAR,
       DELIVERY_RADIUS_METERS,
       SUM(DURATION_SECONDS) AS total_duration,
       MAX(EVENT_STARTED_TIMESTAMP) AS MAX_TIMESTAMP,
       RANK() OVER (PARTITION BY DELIVERY_AREA_ID, MONTH_YEAR
                    ORDER BY SUM(DURATION_SECONDS) DESC) AS RADIUS_RANK
FROM (
    -- Add the MONTH_YEAR column to the delivery_radius_log table
    SELECT DELIVERY_AREA_ID,
           DELIVERY_RADIUS_METERS,
           EVENT_STARTED_TIMESTAMP,
           CONCAT(MONTH(EVENT_STARTED_TIMESTAMP), '/',
                  YEAR(EVENT_STARTED_TIMESTAMP)) AS MONTH_YEAR,
           DATEADD(second, DATEDIFF(second, EVENT_STARTED_TIMESTAMP, LEAD(EVENT_STARTED_TIMESTAMP) OVER (PARTITION BY DELIVERY_AREA_ID ORDER BY EVENT_STARTED_TIMESTAMP)), EVENT_STARTED_TIMESTAMP) AS end_timestamp,
           DATEDIFF(second, EVENT_STARTED_TIMESTAMP, LEAD(EVENT_STARTED_TIMESTAMP) OVER (PARTITION BY DELIVERY_AREA_ID ORDER BY EVENT_STARTED_TIMESTAMP)) AS duration_seconds
    FROM delivery_radius_log
) t  -- added alias here
GROUP BY DELIVERY_AREA_ID, MONTH_YEAR, DELIVERY_RADIUS_METERS
QUALIFY RADIUS_RANK = 1;

If the rank column is not required then the entire expression could be moved:

SELECT DELIVERY_AREA_ID,
       MAX(DELIVERY_RADIUS_METERS) AS default_delivery_radius,
       MONTH_YEAR,
       DELIVERY_RADIUS_METERS,
       SUM(DURATION_SECONDS) AS total_duration,
       MAX(EVENT_STARTED_TIMESTAMP) AS MAX_TIMESTAMP
FROM (
    -- Add the MONTH_YEAR column to the delivery_radius_log table
    SELECT DELIVERY_AREA_ID,
           DELIVERY_RADIUS_METERS,
           EVENT_STARTED_TIMESTAMP,
           CONCAT(MONTH(EVENT_STARTED_TIMESTAMP), '/',
                  YEAR(EVENT_STARTED_TIMESTAMP)) AS MONTH_YEAR,
           DATEADD(second, DATEDIFF(second, EVENT_STARTED_TIMESTAMP, LEAD(EVENT_STARTED_TIMESTAMP) OVER (PARTITION BY DELIVERY_AREA_ID ORDER BY EVENT_STARTED_TIMESTAMP)), EVENT_STARTED_TIMESTAMP) AS end_timestamp,
           DATEDIFF(second, EVENT_STARTED_TIMESTAMP, LEAD(EVENT_STARTED_TIMESTAMP) OVER (PARTITION BY DELIVERY_AREA_ID ORDER BY EVENT_STARTED_TIMESTAMP)) AS duration_seconds
    FROM delivery_radius_log
) t  -- added alias here
GROUP BY DELIVERY_AREA_ID, MONTH_YEAR, DELIVERY_RADIUS_METERS
QUALIFY RANK() OVER (PARTITION BY DELIVERY_AREA_ID, MONTH_YEAR
                    ORDER BY SUM(DURATION_SECONDS) DESC) = 1;

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM