简体   繁体   中英

Query that counts total records per day and total records with same time timestamp and id per day in Bigquery

I have timeseries data like this:

time id value
2018-04-25 22:00:00 UTC A 1
2018-04-25 23:00:00 UTC A 2
2018-04-25 23:00:00 UTC A 2.1
2018-04-25 23:00:00 UTC B 1
2018-04-26 23:00:00 UTC B 1.3

How do i write a query to produce an output table with these columns:

  • date: the truncated time
  • records: the number of records during this date
  • records_conflicting_time_id: the number of records during this date where the combination of time , id are not unique. In the example data above the two records with id==A at 2018-04-25 23:00:00 UTC would be counted for date 2018-04-25

So the output of our query should be:

date records records_conflicting_time_id
2018-04-25 4 2
2018-04-26 1 0

Getting records is easy, i just truncate the time to get date and then group by date . But i'm really struggling to produce a column that counts the number of records where id + time is not unique over that date...

with YOUR_DATA as
  (
              select  cast('2018-04-25 22:00:00 UTC' as timestamp) as `time`, 'A' as id, 1.0 as value
    union all select  cast('2018-04-25 23:00:00 UTC' as timestamp) as `time`, 'A' as id, 2.0 as value
    union all select cast('2018-04-25 23:00:00 UTC' as timestamp) as `time`, 'A' as id, 2.1 as value
    union all select cast('2018-04-25 23:00:00 UTC' as timestamp) as `time`, 'B' as id, 1.0 as value
    union all select cast('2018-04-26 23:00:00 UTC' as timestamp) as `time`, 'B' as id, 1.3 as value
  )
select  cast(timestamp_trunc(t1.`time`, day) as date) as `date`,
        count(*) as records,
        case when count(*)-count(distinct cast(t1.`time` as string) || t1.id) = 0 then 0
          else count(*)-count(distinct cast(t1.`time` as string) || t1.id)+1
        end as records_conflicting_time_id
from    YOUR_DATA t1
group by cast(timestamp_trunc(t1.`time`, day) as date)
;

Consider below approach

select date(time) date, 
  sum(cnt) records, 
  sum(if(cnt > 1, cnt, 0)) conflicting_records
from (
  select time, id, count(*) cnt
  from your_table
  group by time, id
)
group by date              

if applied to sample data in your question - output is

在此处输入图像描述

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM