简体   繁体   English

Presto SQL 查询在算术时间间隔内计算满足一组条件的行数

[英]Presto SQL query that counts the number of rows, that satisfy a set of conditions, within arithmetically progressing time intervals

I want to figure out how the count of rows, that satisfy a set of conditions, has changed over time.我想弄清楚满足一组条件的行数是如何随时间变化的。 To this end, I would like to count the number of rows that satified the conditions at a starting date and then perform the same calculation for each day following up until present day.为此,我想计算在开始日期满足条件的行数,然后对后续的每一天执行相同的计算,直到今天。 My desired output table would something like the below table (excluding the unnamed column):我想要的 output 表类似于下表(不包括未命名的列):

|------------+------------+----------------------------|
|            |       date | rows_staisfying_conditions |
|------------+------------+----------------------------|
| start date | 2021-09-01 |                       2367 |
|            | 2021-09-02 |                       2784 |
|            | 2021-09-03 |                       3011 |
|            | 2021-09-04 |                       3601 |
| today      | 2021-09-05 |                       4155 |
|------------+------------+----------------------------|

A naiive approach to generating the above table is to have one CTE for each day, and then join the CTEs (see below code).生成上表的一种天真的方法是每天有一个 CTE,然后加入 CTE(见下面的代码)。 The problem is that this is verbose and does not scale.问题在于这是冗长的并且不具有扩展性。

WITH day0 AS (
    SELECT count(*) AS day0
    FROM (
        SELECT DISTINCT account_id
        FROM default_table
        WHERE
            secret_column = 'secret value'
            AND lower(device_os) LIKE '%android%'
            AND from_iso8601_timestamp(timestamp) < from_iso8601_timestamp('2021-09-01T00:00:00.0000000Z') + interval '0' day
    )
),
day1 AS (
    SELECT count(*) AS day1
    FROM (
        SELECT DISTINCT account_id
        FROM default_table
        WHERE
            secret_column = 'secret value'
            AND lower(device_os) LIKE '%android%'
            AND from_iso8601_timestamp(timestamp) < from_iso8601_timestamp('2021-09-01T00:00:00.0000000Z') + interval '1' day
    )
),
⋮
day4 AS (
    SELECT count(*) AS day1
    FROM (
        SELECT DISTINCT account_id
        FROM default_table
        WHERE
            secret_column = 'secret value'
            AND lower(device_os) LIKE '%android%'
            AND from_iso8601_timestamp(timestamp) < from_iso8601_timestamp('2021-09-01T00:00:00.0000000Z') + interval '4' day
    )
),
SELECT *
FROM
    day0
    FULL JOIN day1 ON TRUE
    FULL JOIN day2 ON TRUE
    FULL JOIN day3 ON TRUE
    FULL JOIN day4 ON TRUE

Does anyone have a suggestion for how I could compute the above table in a scalable manner?有没有人建议我如何以可扩展的方式计算上表?

You can use sum window finction over grouped by date and ordered sum using next frame :您可以使用sum window 函数来按日期分组并使用下一frame排序和:

WITH dataset (date, condition) AS
(
  VALUES
  (date '2021-09-01', true),
  (date '2021-09-01', true),
  (date '2021-09-01', false),
  (date '2021-09-02', true),
  (date '2021-09-03', true)
)

SELECT date, sum(cnt) over (order by date range between unbounded preceding and current row) rows_staisfying_conditions
FROM (
         SELECT date, sum(case when condition then 1 else 0 end) cnt
         FROM dataset
         GROUP BY date
     )

Output: Output:

date日期 rows_staisfying_conditions rows_staisfying_conditions
2021-09-01 2021-09-01 2 2个
2021-09-02 2021-09-02 3 3个
2021-09-03 2021-09-03 4 4个

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 根据表中的行数执行简单查询的时间复杂度是多少? - What is the time complexity of executing a simple query in terms of the number of rows in a table? 根据重叠的活动时间间隔对 SQL 行进行分组,有效从和有效到 - Grouping SQL rows based on overlapping active time intervals, valid from and valid to Presto SQL:TO_UNIXTIME - Presto SQL: TO_UNIXTIME 如何根据月份内的日期对 SQL 查询中的周数进行编号 - How to Number Weeks in SQL Query based on Date within Month dynamodb 查询:ValidationException:键上的条件数无效 - dynamodb query: ValidationException: The number of conditions on the keys is invalid 检查亚马逊 Athena / Presto 中两个间隔是否重叠的方法 - way to check if two intervals overlap in amazon Athena / Presto 雅典娜 / 急速查询 - athena / presto query SQL 查询:select后面的所有行每次特定参数获取特定值 - SQL query: How to select all following rows every time specific parameter acquires specific value 如何通过添加相同的间隔 x 次数来创建时间线? - SQL - How to create timeline by adding same intervals x number of times? - SQL Athena 嵌套结构查询——如何查询 SQL 中的 Value_counts - Athena nested Struct Querying - how to query Value_counts in SQL
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM