so I have a table that's sort of like this:
DELIVERY_AREA_ID | DELIVERY_RADIUS_METERS | EVENT_STARTED_TIMESTAMP |
---|---|---|
234sfd | 4000 | 2020-01-01 12:19:29.719 |
234sfd | 6500 | 2020-01-01 12:31:40.325 |
234sfd | 3500 | 2020-01-01 12:53:10.538 |
234sfd | 6500 | 2020-01-01 13:11:36.094 |
234sfd | 3500 | 2020-01-01 13:32:26.754 |
234sfd | 6500 | 2020-01-01 13:59:11.104 |
234sfd | 6500 | 2020-01-02 07:44:16.792 |
234sfd | 3500 | 2020-01-02 08:07:36.284 |
234sfd | 6500 | 2020-01-02 08:54:08.014 |
234sfd | 3500 | 2020-01-02 09:53:05.853 |
234sfd | 6500 | 2020-01-02 10:04:39.443 |
234sfd | 10000 | 2020-07-01 08:29:20.194 |
234sfd | 3500 | 2020-07-03 07:50:41.782 |
234sfd | 10000 | 2020-07-03 08:33:14.695 |
234sfd | 3500 | 2020-07-05 07:47:05.539 |
234sfd | 10000 | 2020-07-05 07:53:13.930 |
234sfd | 3500 | 2020-07-05 09:18:57.688 |
234sfd | 10000 | 2020-07-05 09:51:07.547 |
234sfd | 3500 | 2020-07-19 18:02:14.099 |
the data is actually much more varied but yeah it follows that format.
I am trying to, in one query, in snowflake database, make a get the top ranked radius by duration. I currently have this:
SELECT DELIVERY_AREA_ID,
MAX(DELIVERY_RADIUS_METERS) AS default_delivery_radius,
MONTH_YEAR,
DELIVERY_RADIUS_METERS,
SUM(DURATION_SECONDS) AS total_duration,
MAX(EVENT_STARTED_TIMESTAMP) AS MAX_TIMESTAMP,
RANK() OVER (PARTITION BY DELIVERY_AREA_ID, MONTH_YEAR
ORDER BY SUM(DURATION_SECONDS) DESC) AS RADIUS_RANK
FROM (
-- Add the MONTH_YEAR column to the delivery_radius_log table
SELECT DELIVERY_AREA_ID,
DELIVERY_RADIUS_METERS,
EVENT_STARTED_TIMESTAMP,
CONCAT(MONTH(EVENT_STARTED_TIMESTAMP), '/',
YEAR(EVENT_STARTED_TIMESTAMP)) AS MONTH_YEAR,
DATEADD(second, DATEDIFF(second, EVENT_STARTED_TIMESTAMP, LEAD(EVENT_STARTED_TIMESTAMP) OVER (PARTITION BY DELIVERY_AREA_ID ORDER BY EVENT_STARTED_TIMESTAMP)), EVENT_STARTED_TIMESTAMP) AS end_timestamp,
DATEDIFF(second, EVENT_STARTED_TIMESTAMP, LEAD(EVENT_STARTED_TIMESTAMP) OVER (PARTITION BY DELIVERY_AREA_ID ORDER BY EVENT_STARTED_TIMESTAMP)) AS duration_seconds
FROM delivery_radius_log
) t -- added alias here
GROUP BY DELIVERY_AREA_ID, MONTH_YEAR, DELIVERY_RADIUS_METERS
I want to get the first rank for each month_year but when I use
where RADIUS_RANK = 1
I get an error: Syntax error: unexpected 'where'. (line 21)
Im not sure how to resolve this
I have tried this link which appears to have the same question but the solution is already what I am trying.
It is not possible to solve this scenario without querying the output of your query, in other words, using the output of that query as an input for another top-level query.
So the only solution is to query the output of that query and retrieve only the MIN rank.
To filter windowed function at the same query level you need to use QUALIFY clause:
SELECT DELIVERY_AREA_ID,
MAX(DELIVERY_RADIUS_METERS) AS default_delivery_radius,
MONTH_YEAR,
DELIVERY_RADIUS_METERS,
SUM(DURATION_SECONDS) AS total_duration,
MAX(EVENT_STARTED_TIMESTAMP) AS MAX_TIMESTAMP,
RANK() OVER (PARTITION BY DELIVERY_AREA_ID, MONTH_YEAR
ORDER BY SUM(DURATION_SECONDS) DESC) AS RADIUS_RANK
FROM (
-- Add the MONTH_YEAR column to the delivery_radius_log table
SELECT DELIVERY_AREA_ID,
DELIVERY_RADIUS_METERS,
EVENT_STARTED_TIMESTAMP,
CONCAT(MONTH(EVENT_STARTED_TIMESTAMP), '/',
YEAR(EVENT_STARTED_TIMESTAMP)) AS MONTH_YEAR,
DATEADD(second, DATEDIFF(second, EVENT_STARTED_TIMESTAMP, LEAD(EVENT_STARTED_TIMESTAMP) OVER (PARTITION BY DELIVERY_AREA_ID ORDER BY EVENT_STARTED_TIMESTAMP)), EVENT_STARTED_TIMESTAMP) AS end_timestamp,
DATEDIFF(second, EVENT_STARTED_TIMESTAMP, LEAD(EVENT_STARTED_TIMESTAMP) OVER (PARTITION BY DELIVERY_AREA_ID ORDER BY EVENT_STARTED_TIMESTAMP)) AS duration_seconds
FROM delivery_radius_log
) t -- added alias here
GROUP BY DELIVERY_AREA_ID, MONTH_YEAR, DELIVERY_RADIUS_METERS
QUALIFY RADIUS_RANK = 1;
If the rank column is not required then the entire expression could be moved:
SELECT DELIVERY_AREA_ID,
MAX(DELIVERY_RADIUS_METERS) AS default_delivery_radius,
MONTH_YEAR,
DELIVERY_RADIUS_METERS,
SUM(DURATION_SECONDS) AS total_duration,
MAX(EVENT_STARTED_TIMESTAMP) AS MAX_TIMESTAMP
FROM (
-- Add the MONTH_YEAR column to the delivery_radius_log table
SELECT DELIVERY_AREA_ID,
DELIVERY_RADIUS_METERS,
EVENT_STARTED_TIMESTAMP,
CONCAT(MONTH(EVENT_STARTED_TIMESTAMP), '/',
YEAR(EVENT_STARTED_TIMESTAMP)) AS MONTH_YEAR,
DATEADD(second, DATEDIFF(second, EVENT_STARTED_TIMESTAMP, LEAD(EVENT_STARTED_TIMESTAMP) OVER (PARTITION BY DELIVERY_AREA_ID ORDER BY EVENT_STARTED_TIMESTAMP)), EVENT_STARTED_TIMESTAMP) AS end_timestamp,
DATEDIFF(second, EVENT_STARTED_TIMESTAMP, LEAD(EVENT_STARTED_TIMESTAMP) OVER (PARTITION BY DELIVERY_AREA_ID ORDER BY EVENT_STARTED_TIMESTAMP)) AS duration_seconds
FROM delivery_radius_log
) t -- added alias here
GROUP BY DELIVERY_AREA_ID, MONTH_YEAR, DELIVERY_RADIUS_METERS
QUALIFY RANK() OVER (PARTITION BY DELIVERY_AREA_ID, MONTH_YEAR
ORDER BY SUM(DURATION_SECONDS) DESC) = 1;
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.