[英]Using a where statement with rank and a subquery in SQL
so I have a table that's sort of like this:所以我有一张类似这样的表:
DELIVERY_AREA_ID ![]() |
DELIVERY_RADIUS_METERS ![]() |
EVENT_STARTED_TIMESTAMP ![]() |
---|---|---|
234sfd ![]() |
4000 ![]() |
2020-01-01 12:19:29.719 ![]() |
234sfd ![]() |
6500 ![]() |
2020-01-01 12:31:40.325 ![]() |
234sfd ![]() |
3500 ![]() |
2020-01-01 12:53:10.538 ![]() |
234sfd ![]() |
6500 ![]() |
2020-01-01 13:11:36.094 ![]() |
234sfd ![]() |
3500 ![]() |
2020-01-01 13:32:26.754 ![]() |
234sfd ![]() |
6500 ![]() |
2020-01-01 13:59:11.104 ![]() |
234sfd ![]() |
6500 ![]() |
2020-01-02 07:44:16.792 ![]() |
234sfd ![]() |
3500 ![]() |
2020-01-02 08:07:36.284 ![]() |
234sfd ![]() |
6500 ![]() |
2020-01-02 08:54:08.014 ![]() |
234sfd ![]() |
3500 ![]() |
2020-01-02 09:53:05.853 ![]() |
234sfd ![]() |
6500 ![]() |
2020-01-02 10:04:39.443 ![]() |
234sfd ![]() |
10000 ![]() |
2020-07-01 08:29:20.194 ![]() |
234sfd ![]() |
3500 ![]() |
2020-07-03 07:50:41.782 ![]() |
234sfd ![]() |
10000 ![]() |
2020-07-03 08:33:14.695 ![]() |
234sfd ![]() |
3500 ![]() |
2020-07-05 07:47:05.539 ![]() |
234sfd ![]() |
10000 ![]() |
2020-07-05 07:53:13.930 ![]() |
234sfd ![]() |
3500 ![]() |
2020-07-05 09:18:57.688 ![]() |
234sfd ![]() |
10000 ![]() |
2020-07-05 09:51:07.547 ![]() |
234sfd ![]() |
3500 ![]() |
2020-07-19 18:02:14.099 ![]() |
the data is actually much more varied but yeah it follows that format.数据实际上更加多样化,但是是的,它遵循这种格式。
I am trying to, in one query, in snowflake database, make a get the top ranked radius by duration.我试图在一个查询中,在雪花数据库中,按持续时间获得排名最高的半径。 I currently have this:
我目前有这个:
SELECT DELIVERY_AREA_ID,
MAX(DELIVERY_RADIUS_METERS) AS default_delivery_radius,
MONTH_YEAR,
DELIVERY_RADIUS_METERS,
SUM(DURATION_SECONDS) AS total_duration,
MAX(EVENT_STARTED_TIMESTAMP) AS MAX_TIMESTAMP,
RANK() OVER (PARTITION BY DELIVERY_AREA_ID, MONTH_YEAR
ORDER BY SUM(DURATION_SECONDS) DESC) AS RADIUS_RANK
FROM (
-- Add the MONTH_YEAR column to the delivery_radius_log table
SELECT DELIVERY_AREA_ID,
DELIVERY_RADIUS_METERS,
EVENT_STARTED_TIMESTAMP,
CONCAT(MONTH(EVENT_STARTED_TIMESTAMP), '/',
YEAR(EVENT_STARTED_TIMESTAMP)) AS MONTH_YEAR,
DATEADD(second, DATEDIFF(second, EVENT_STARTED_TIMESTAMP, LEAD(EVENT_STARTED_TIMESTAMP) OVER (PARTITION BY DELIVERY_AREA_ID ORDER BY EVENT_STARTED_TIMESTAMP)), EVENT_STARTED_TIMESTAMP) AS end_timestamp,
DATEDIFF(second, EVENT_STARTED_TIMESTAMP, LEAD(EVENT_STARTED_TIMESTAMP) OVER (PARTITION BY DELIVERY_AREA_ID ORDER BY EVENT_STARTED_TIMESTAMP)) AS duration_seconds
FROM delivery_radius_log
) t -- added alias here
GROUP BY DELIVERY_AREA_ID, MONTH_YEAR, DELIVERY_RADIUS_METERS
I want to get the first rank for each month_year but when I use我想获得每个 month_year 的第一名但是当我使用
where RADIUS_RANK = 1
I get an error: Syntax error: unexpected 'where'.我收到一个错误:语法错误:意外的“哪里”。 (line 21)
(第 21 行)
Im not sure how to resolve this我不确定如何解决这个问题
I have tried this link which appears to have the same question but the solution is already what I am trying.我试过这个链接似乎有同样的问题,但解决方案已经是我正在尝试的。
It is not possible to solve this scenario without querying the output of your query, in other words, using the output of that query as an input for another top-level query.不查询查询的输出就不可能解决这种情况,换句话说,使用该查询的输出作为另一个顶级查询的输入。
So the only solution is to query the output of that query and retrieve only the MIN rank.因此唯一的解决方案是查询该查询的输出并仅检索 MIN 排名。
To filter windowed function at the same query level you need to use QUALIFY clause:要在同一查询级别过滤窗口函数,您需要使用QUALIFY子句:
SELECT DELIVERY_AREA_ID,
MAX(DELIVERY_RADIUS_METERS) AS default_delivery_radius,
MONTH_YEAR,
DELIVERY_RADIUS_METERS,
SUM(DURATION_SECONDS) AS total_duration,
MAX(EVENT_STARTED_TIMESTAMP) AS MAX_TIMESTAMP,
RANK() OVER (PARTITION BY DELIVERY_AREA_ID, MONTH_YEAR
ORDER BY SUM(DURATION_SECONDS) DESC) AS RADIUS_RANK
FROM (
-- Add the MONTH_YEAR column to the delivery_radius_log table
SELECT DELIVERY_AREA_ID,
DELIVERY_RADIUS_METERS,
EVENT_STARTED_TIMESTAMP,
CONCAT(MONTH(EVENT_STARTED_TIMESTAMP), '/',
YEAR(EVENT_STARTED_TIMESTAMP)) AS MONTH_YEAR,
DATEADD(second, DATEDIFF(second, EVENT_STARTED_TIMESTAMP, LEAD(EVENT_STARTED_TIMESTAMP) OVER (PARTITION BY DELIVERY_AREA_ID ORDER BY EVENT_STARTED_TIMESTAMP)), EVENT_STARTED_TIMESTAMP) AS end_timestamp,
DATEDIFF(second, EVENT_STARTED_TIMESTAMP, LEAD(EVENT_STARTED_TIMESTAMP) OVER (PARTITION BY DELIVERY_AREA_ID ORDER BY EVENT_STARTED_TIMESTAMP)) AS duration_seconds
FROM delivery_radius_log
) t -- added alias here
GROUP BY DELIVERY_AREA_ID, MONTH_YEAR, DELIVERY_RADIUS_METERS
QUALIFY RADIUS_RANK = 1;
If the rank column is not required then the entire expression could be moved:如果不需要排名列,则可以移动整个表达式:
SELECT DELIVERY_AREA_ID,
MAX(DELIVERY_RADIUS_METERS) AS default_delivery_radius,
MONTH_YEAR,
DELIVERY_RADIUS_METERS,
SUM(DURATION_SECONDS) AS total_duration,
MAX(EVENT_STARTED_TIMESTAMP) AS MAX_TIMESTAMP
FROM (
-- Add the MONTH_YEAR column to the delivery_radius_log table
SELECT DELIVERY_AREA_ID,
DELIVERY_RADIUS_METERS,
EVENT_STARTED_TIMESTAMP,
CONCAT(MONTH(EVENT_STARTED_TIMESTAMP), '/',
YEAR(EVENT_STARTED_TIMESTAMP)) AS MONTH_YEAR,
DATEADD(second, DATEDIFF(second, EVENT_STARTED_TIMESTAMP, LEAD(EVENT_STARTED_TIMESTAMP) OVER (PARTITION BY DELIVERY_AREA_ID ORDER BY EVENT_STARTED_TIMESTAMP)), EVENT_STARTED_TIMESTAMP) AS end_timestamp,
DATEDIFF(second, EVENT_STARTED_TIMESTAMP, LEAD(EVENT_STARTED_TIMESTAMP) OVER (PARTITION BY DELIVERY_AREA_ID ORDER BY EVENT_STARTED_TIMESTAMP)) AS duration_seconds
FROM delivery_radius_log
) t -- added alias here
GROUP BY DELIVERY_AREA_ID, MONTH_YEAR, DELIVERY_RADIUS_METERS
QUALIFY RANK() OVER (PARTITION BY DELIVERY_AREA_ID, MONTH_YEAR
ORDER BY SUM(DURATION_SECONDS) DESC) = 1;
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.