简体   繁体   English

在 Bigquery 中计算每天总记录数和每天具有相同时间时间戳和 id 的总记录数的查询

[英]Query that counts total records per day and total records with same time timestamp and id per day in Bigquery

I have timeseries data like this:我有这样的时间序列数据:

time时间 id ID value价值
2018-04-25 22:00:00 UTC 2018-04-25 22:00:00 UTC A一个 1 1
2018-04-25 23:00:00 UTC 2018-04-25 23:00:00 UTC A一个 2 2
2018-04-25 23:00:00 UTC 2018-04-25 23:00:00 UTC A一个 2.1 2.1
2018-04-25 23:00:00 UTC 2018-04-25 23:00:00 UTC B 1 1
2018-04-26 23:00:00 UTC 2018-04-26 23:00:00 UTC B 1.3 1.3

How do i write a query to produce an output table with these columns:如何编写查询以生成包含这些列的 output 表:

  • date: the truncated time日期:截断的时间
  • records: the number of records during this date记录:该日期的记录数
  • records_conflicting_time_id: the number of records during this date where the combination of time , id are not unique. records_conflicting_time_id:在此日期内timeid的组合不唯一的记录数。 In the example data above the two records with id==A at 2018-04-25 23:00:00 UTC would be counted for date 2018-04-25在上面的示例数据中,id==A 在 2018-04-25 23:00:00 UTC 的两条记录将被计算为日期 2018-04-25

So the output of our query should be:所以我们查询的 output 应该是:

date日期 records记录 records_conflicting_time_id records_conflicting_time_id
2018-04-25 2018-04-25 4 4 2 2
2018-04-26 2018-04-26 1 1 0 0

Getting records is easy, i just truncate the time to get date and then group by date .获取records很容易,我只是截断获取日期的time ,然后按date分组。 But i'm really struggling to produce a column that counts the number of records where id + time is not unique over that date...但我真的很难生成一个列来计算id + time在该日期不是唯一的记录数......

with YOUR_DATA as
  (
              select  cast('2018-04-25 22:00:00 UTC' as timestamp) as `time`, 'A' as id, 1.0 as value
    union all select  cast('2018-04-25 23:00:00 UTC' as timestamp) as `time`, 'A' as id, 2.0 as value
    union all select cast('2018-04-25 23:00:00 UTC' as timestamp) as `time`, 'A' as id, 2.1 as value
    union all select cast('2018-04-25 23:00:00 UTC' as timestamp) as `time`, 'B' as id, 1.0 as value
    union all select cast('2018-04-26 23:00:00 UTC' as timestamp) as `time`, 'B' as id, 1.3 as value
  )
select  cast(timestamp_trunc(t1.`time`, day) as date) as `date`,
        count(*) as records,
        case when count(*)-count(distinct cast(t1.`time` as string) || t1.id) = 0 then 0
          else count(*)-count(distinct cast(t1.`time` as string) || t1.id)+1
        end as records_conflicting_time_id
from    YOUR_DATA t1
group by cast(timestamp_trunc(t1.`time`, day) as date)
;

Consider below approach考虑以下方法

select date(time) date, 
  sum(cnt) records, 
  sum(if(cnt > 1, cnt, 0)) conflicting_records
from (
  select time, id, count(*) cnt
  from your_table
  group by time, id
)
group by date              

if applied to sample data in your question - output is如果应用于您问题中的示例数据 - output 是

在此处输入图像描述

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 通过 SQL (Bigquery) 获取每位用户每天的最新余额 - Get latest balance by day per user via SQL (Bigquery) firebase 超出配额(每天?) - firebase quota exceeded (per day?) 如何获取累计用户总数但忽略前一天已经出现的用户? 使用大查询 - How to get cumulative total users but ignoring the users who already appear in previous day? using bigquery BigQuery:计算每个工作块的迭代次数 - BigQuery: counts iterations per work block 每年唯一 ID 的 BigQuery 运行计数 - BigQuery Running Count of Unique ID per Year 有没有办法让 Pub/Sub -> Dataflow -> BigQuery 模板来处理每条消息的多个记录? - Is there a way to get the Pub/Sub -> Dataflow -> BigQuery template to cope with multiple records per message? BigQuery 检索时间戳为一天中给定时间之前或之后 n 分钟的所有行 - BigQuery retrieve all rows where timestamp is n minutes before or after a given time of the day 根据月份的日期在 Firebase 中保留文档字段的总计 - keeping a running total of documents field in Firebase based upon the day of month CloudWatch 查询加入两个具有相同 ID 的记录 - CloudWatch Query join two records having same id 如何创建一个新表,只保留Bigquery中相同ID下超过5条数据记录的行 - How to create a new table that only keeps rows with more than 5 data records under the same id in Bigquery
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM