Presto SQL 查询在算术时间间隔内计算满足一组条件的行数

Question

I want to figure out how the count of rows, that satisfy a set of conditions, has changed over time.我想弄清楚满足一组条件的行数是如何随时间变化的。 To this end, I would like to count the number of rows that satified the conditions at a starting date and then perform the same calculation for each day following up until present day.为此，我想计算在开始日期满足条件的行数，然后对后续的每一天执行相同的计算，直到今天。 My desired output table would something like the below table (excluding the unnamed column):我想要的 output 表类似于下表（不包括未命名的列）：

|------------+------------+----------------------------|
|            |       date | rows_staisfying_conditions |
|------------+------------+----------------------------|
| start date | 2021-09-01 |                       2367 |
|            | 2021-09-02 |                       2784 |
|            | 2021-09-03 |                       3011 |
|            | 2021-09-04 |                       3601 |
| today      | 2021-09-05 |                       4155 |
|------------+------------+----------------------------|

A naiive approach to generating the above table is to have one CTE for each day, and then join the CTEs (see below code).生成上表的一种天真的方法是每天有一个 CTE，然后加入 CTE（见下面的代码）。 The problem is that this is verbose and does not scale.问题在于这是冗长的并且不具有扩展性。

WITH day0 AS (
    SELECT count(*) AS day0
    FROM (
        SELECT DISTINCT account_id
        FROM default_table
        WHERE
            secret_column = 'secret value'
            AND lower(device_os) LIKE '%android%'
            AND from_iso8601_timestamp(timestamp) < from_iso8601_timestamp('2021-09-01T00:00:00.0000000Z') + interval '0' day
    )
),
day1 AS (
    SELECT count(*) AS day1
    FROM (
        SELECT DISTINCT account_id
        FROM default_table
        WHERE
            secret_column = 'secret value'
            AND lower(device_os) LIKE '%android%'
            AND from_iso8601_timestamp(timestamp) < from_iso8601_timestamp('2021-09-01T00:00:00.0000000Z') + interval '1' day
    )
),
⋮
day4 AS (
    SELECT count(*) AS day1
    FROM (
        SELECT DISTINCT account_id
        FROM default_table
        WHERE
            secret_column = 'secret value'
            AND lower(device_os) LIKE '%android%'
            AND from_iso8601_timestamp(timestamp) < from_iso8601_timestamp('2021-09-01T00:00:00.0000000Z') + interval '4' day
    )
),
SELECT *
FROM
    day0
    FULL JOIN day1 ON TRUE
    FULL JOIN day2 ON TRUE
    FULL JOIN day3 ON TRUE
    FULL JOIN day4 ON TRUE

Does anyone have a suggestion for how I could compute the above table in a scalable manner?有没有人建议我如何以可扩展的方式计算上表？

Answer 1

You can use sum window finction over grouped by date and ordered sum using next frame :您可以使用sum window 函数来按日期分组并使用下一frame排序和：

WITH dataset (date, condition) AS
(
  VALUES
  (date '2021-09-01', true),
  (date '2021-09-01', true),
  (date '2021-09-01', false),
  (date '2021-09-02', true),
  (date '2021-09-03', true)
)

SELECT date, sum(cnt) over (order by date range between unbounded preceding and current row) rows_staisfying_conditions
FROM (
         SELECT date, sum(case when condition then 1 else 0 end) cnt
         FROM dataset
         GROUP BY date
     )

Output: Output：

date日期	rows_staisfying_conditions rows_staisfying_conditions
2021-09-01 2021-09-01	2 2个
2021-09-02 2021-09-02	3 3个
2021-09-03 2021-09-03	4 4个

Presto SQL 查询在算术时间间隔内计算满足一组条件的行数

问题描述

1 个解决方案

解决方案1
0 2021-09-27 19:20:01

Presto SQL 查询在算术时间间隔内计算满足一组条件的行数

问题描述

1 个解决方案

解决方案1 0 2021-09-27 19:20:01

解决方案1
0 2021-09-27 19:20:01