[英]SQL count distinct over partition by cumulatively
I am using AWS Athena (Presto based) and I have this table named base
:我正在使用 AWS Athena(基于 Presto)并且我有这个名为
base
的表:
id ![]() |
category![]() |
year![]() |
month![]() |
---|---|---|---|
1 ![]() |
a![]() |
2021 ![]() |
6 ![]() |
1 ![]() |
b ![]() |
2022 ![]() |
8 ![]() |
1 ![]() |
a![]() |
2022 ![]() |
11 ![]() |
2 ![]() |
a![]() |
2022 ![]() |
1 ![]() |
2 ![]() |
a![]() |
2022 ![]() |
4 ![]() |
2 ![]() |
b ![]() |
2022 ![]() |
6 ![]() |
I would like to craft a query that counts the distinct values of the categories per id, cumulatively per month and year, but retaining the original columns:我想制作一个查询,计算每个 id 类别的不同值,每月和每年累积,但保留原始列:
id ![]() |
category![]() |
year![]() |
month![]() |
sumC![]() |
---|---|---|---|---|
1 ![]() |
a![]() |
2021 ![]() |
6 ![]() |
1 ![]() |
1 ![]() |
b ![]() |
2022 ![]() |
8 ![]() |
2 ![]() |
1 ![]() |
a![]() |
2022 ![]() |
11 ![]() |
2 ![]() |
2 ![]() |
a![]() |
2022 ![]() |
1 ![]() |
1 ![]() |
2 ![]() |
a![]() |
2022 ![]() |
4 ![]() |
1 ![]() |
2 ![]() |
b ![]() |
2022 ![]() |
6 ![]() |
2 ![]() |
I've tried doing the following query with no success:我尝试执行以下查询但没有成功:
SELECT id,
category,
year,
month,
COUNT(category) OVER (PARTITION BY id, ORDER BY year, month) AS sumC FROM base;
This results in 1, 2, 3, 1, 2, 3
which is not what I'm looking for.这导致
1, 2, 3, 1, 2, 3
这不是我想要的。 I'd rather need something like a COUNT(DISTINCT)
inside a window function, though it's not supported as a construct.我宁愿在窗口函数中需要类似
COUNT(DISTINCT)
的东西,尽管它不支持作为构造。
I also tried the DENSE_RANK
trick:我还尝试了
DENSE_RANK
技巧:
DENSE_RANK() OVER (PARTITION BY id ORDER BY category)
+ DENSE_RANK() OVER (PARTITION BY id ORDER BY category)
- 1 as sumC
Though, because there is no ordering between year
and month
, it just results in 2, 2, 2, 2, 2, 2
.但是,由于
year
和month
之间没有排序,它只会导致2, 2, 2, 2, 2, 2
。
Any help is appreciated!任何帮助表示赞赏!
One option is一种选择是
WITH cte AS (
SELECT *,
CASE WHEN ROW_NUMBER() OVER(
PARTITION BY id, category
ORDER BY year, month) = 1
THEN 1
ELSE 0
END AS rn1
FROM base
ORDER BY id,
year_,
month_
)
SELECT id,
category,
year_,
month_,
SUM(rn1) OVER(
PARTITION BY id
ORDER BY year, month
) AS sumC
FROM cte
Does it work for you?对你起作用吗?
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.