繁体   English   中英

在 Bigquery 中计算每天总记录数和每天具有相同时间时间戳和 id 的总记录数的查询

[英]Query that counts total records per day and total records with same time timestamp and id per day in Bigquery

我有这样的时间序列数据:

时间 ID 价值
2018-04-25 22:00:00 UTC 一个 1
2018-04-25 23:00:00 UTC 一个 2
2018-04-25 23:00:00 UTC 一个 2.1
2018-04-25 23:00:00 UTC 1
2018-04-26 23:00:00 UTC 1.3

如何编写查询以生成包含这些列的 output 表:

  • 日期:截断的时间
  • 记录:该日期的记录数
  • records_conflicting_time_id:在此日期内timeid的组合不唯一的记录数。 在上面的示例数据中,id==A 在 2018-04-25 23:00:00 UTC 的两条记录将被计算为日期 2018-04-25

所以我们查询的 output 应该是:

日期 记录 records_conflicting_time_id
2018-04-25 4 2
2018-04-26 1 0

获取records很容易,我只是截断获取日期的time ,然后按date分组。 但我真的很难生成一个列来计算id + time在该日期不是唯一的记录数......

with YOUR_DATA as
  (
              select  cast('2018-04-25 22:00:00 UTC' as timestamp) as `time`, 'A' as id, 1.0 as value
    union all select  cast('2018-04-25 23:00:00 UTC' as timestamp) as `time`, 'A' as id, 2.0 as value
    union all select cast('2018-04-25 23:00:00 UTC' as timestamp) as `time`, 'A' as id, 2.1 as value
    union all select cast('2018-04-25 23:00:00 UTC' as timestamp) as `time`, 'B' as id, 1.0 as value
    union all select cast('2018-04-26 23:00:00 UTC' as timestamp) as `time`, 'B' as id, 1.3 as value
  )
select  cast(timestamp_trunc(t1.`time`, day) as date) as `date`,
        count(*) as records,
        case when count(*)-count(distinct cast(t1.`time` as string) || t1.id) = 0 then 0
          else count(*)-count(distinct cast(t1.`time` as string) || t1.id)+1
        end as records_conflicting_time_id
from    YOUR_DATA t1
group by cast(timestamp_trunc(t1.`time`, day) as date)
;

考虑以下方法

select date(time) date, 
  sum(cnt) records, 
  sum(if(cnt > 1, cnt, 0)) conflicting_records
from (
  select time, id, count(*) cnt
  from your_table
  group by time, id
)
group by date              

如果应用于您问题中的示例数据 - output 是

在此处输入图像描述

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM