[英]Presto SQL query that counts the number of rows, that satisfy a set of conditions, within arithmetically progressing time intervals
I want to figure out how the count of rows, that satisfy a set of conditions, has changed over time.我想弄清楚满足一组条件的行数是如何随时间变化的。 To this end, I would like to count the number of rows that satified the conditions at a starting date and then perform the same calculation for each day following up until present day.为此,我想计算在开始日期满足条件的行数,然后对后续的每一天执行相同的计算,直到今天。 My desired output table would something like the below table (excluding the unnamed column):我想要的 output 表类似于下表(不包括未命名的列):
|------------+------------+----------------------------|
| | date | rows_staisfying_conditions |
|------------+------------+----------------------------|
| start date | 2021-09-01 | 2367 |
| | 2021-09-02 | 2784 |
| | 2021-09-03 | 3011 |
| | 2021-09-04 | 3601 |
| today | 2021-09-05 | 4155 |
|------------+------------+----------------------------|
A naiive approach to generating the above table is to have one CTE for each day, and then join the CTEs (see below code).生成上表的一种天真的方法是每天有一个 CTE,然后加入 CTE(见下面的代码)。 The problem is that this is verbose and does not scale.问题在于这是冗长的并且不具有扩展性。
WITH day0 AS (
SELECT count(*) AS day0
FROM (
SELECT DISTINCT account_id
FROM default_table
WHERE
secret_column = 'secret value'
AND lower(device_os) LIKE '%android%'
AND from_iso8601_timestamp(timestamp) < from_iso8601_timestamp('2021-09-01T00:00:00.0000000Z') + interval '0' day
)
),
day1 AS (
SELECT count(*) AS day1
FROM (
SELECT DISTINCT account_id
FROM default_table
WHERE
secret_column = 'secret value'
AND lower(device_os) LIKE '%android%'
AND from_iso8601_timestamp(timestamp) < from_iso8601_timestamp('2021-09-01T00:00:00.0000000Z') + interval '1' day
)
),
⋮
day4 AS (
SELECT count(*) AS day1
FROM (
SELECT DISTINCT account_id
FROM default_table
WHERE
secret_column = 'secret value'
AND lower(device_os) LIKE '%android%'
AND from_iso8601_timestamp(timestamp) < from_iso8601_timestamp('2021-09-01T00:00:00.0000000Z') + interval '4' day
)
),
SELECT *
FROM
day0
FULL JOIN day1 ON TRUE
FULL JOIN day2 ON TRUE
FULL JOIN day3 ON TRUE
FULL JOIN day4 ON TRUE
Does anyone have a suggestion for how I could compute the above table in a scalable manner?有没有人建议我如何以可扩展的方式计算上表?
You can use sum
window finction over grouped by date and ordered sum using next frame
:您可以使用sum
window 函数来按日期分组并使用下一frame
排序和:
WITH dataset (date, condition) AS
(
VALUES
(date '2021-09-01', true),
(date '2021-09-01', true),
(date '2021-09-01', false),
(date '2021-09-02', true),
(date '2021-09-03', true)
)
SELECT date, sum(cnt) over (order by date range between unbounded preceding and current row) rows_staisfying_conditions
FROM (
SELECT date, sum(case when condition then 1 else 0 end) cnt
FROM dataset
GROUP BY date
)
Output: Output:
date日期 | rows_staisfying_conditions rows_staisfying_conditions |
---|---|
2021-09-01 2021-09-01 | 2 2个 |
2021-09-02 2021-09-02 | 3 3个 |
2021-09-03 2021-09-03 | 4 4个 |
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.