sql 使用时间序列查询

Question

我在 bigquery 中有下表：

 Timestamp        variant_id        activity
2020-04-02 08:50    1               active
2020-04-03 07:39    1               not_active
2020-04-04 07:40    1               active
2020-04-05 10:22    2               active
2020-04-07 07:59    2               not_active

我想查询这个数据子集以获取每天的活动变体数量。

如果 variant_id 1在日期 2020-04-04 处于活动状态，它仍然在以下日期处于活动状态 2020-04-05、2020-04-06 直到 value 活动列为 not_active ，目标是每天计算 variant_id 的数量谁在列活动中具有活动值，但我应该考虑到每个 variant_id 都具有特定日期的最后一个活动的值。

例如，子集数据中所需查询的结果必须是：

Date       activity_count
2020-04-02  1
2020-04-03  0
2020-04-04  1
2020-04-05  2
2020-04-06  2
2020-04-07  1
2020-04-08  1
2020-04-09  1
2020-04-10  1

请问有什么帮助吗？

Answer 1

考虑以下方法

select date, count(distinct if(activity = 'active', variant_id, null)) activity_count
from (
  select date(timestamp) date, variant_id, activity,
    lead(date(timestamp)) over(partition by variant_id order by timestamp) next_date
  from your_table
  ), unnest(generate_date_array(date, ifnull(next_date - 1, '2020-04-10'))) date 
group by date

如果应用于您问题中的示例数据 - output 是

sql 使用时间序列查询

问题描述

1 个解决方案

解决方案1
1 2022-01-25 21:53:17

sql 使用时间序列查询

问题描述

1 个解决方案

解决方案1 1 2022-01-25 21:53:17

解决方案1
1 2022-01-25 21:53:17