简体   繁体   English

BigQuery:标量子查询产生了不止一个 - 将日期时间聚合到时间间隔中的数组

[英]BigQuery : Scalar Subquery produced more than one - aggregating datetime to array in time interval

I'm trying to find events that occurred in a specific time-interval (different interval per row), and add it as a column.我正在尝试查找在特定时间间隔(每行不同的间隔)中发生的事件,并将其添加为一列。 The two tables attached at the end: (1) time intervals, (2) datetime events最后附上的两个表:(1)时间间隔,(2)日期时间事件

Firstly, I added a column with all the datatime events as array for every row.首先,我为每一行添加了一个包含所有数据时间事件的列作为数组。 example例子

Secondly, i used this code to count how many datetime in each interval:其次,我使用此代码计算每个间隔中有多少个日期时间:

-- count how many urine-output chart-events for every hourly interval
SELECT T_PLUS, START_TIME_ROUNDED_UP, TIME_INTERVAL_STARTS, TIME_INTERVAL_FINISH,
sum((SELECT count(*) FROM UNNEST(twi.ca) as x WHERE x BETWEEN twi.TIME_INTERVAL_STARTS AND twi.TIME_INTERVAL_FINISH)) NUMBER_OF_OUTPUTS_IN_INTERVAL
FROM TIMES_WITH_INTERVALS twi
group by T_PLUS, START_TIME_ROUNDED_UP, TIME_INTERVAL_STARTS, TIME_INTERVAL_FINISH

When I tried to add another column with the datetime that occur in the interval (in array) I get:当我尝试添加另一列与间隔中出现的日期时间(在数组中)时,我得到:

Scalar subquery produced more than one element标量子查询产生了多个元素

This is the code I used:这是我使用的代码:

SELECT T_PLUS, START_TIME_ROUNDED_UP, TIME_INTERVAL_STARTS, TIME_INTERVAL_FINISH,
sum((SELECT count(*) FROM UNNEST(twi.ca) as x WHERE x BETWEEN twi.TIME_INTERVAL_STARTS AND twi.TIME_INTERVAL_FINISH)) NUMBER_OF_OUTPUTS_IN_INTERVAL,
ARRAY_AGG(FORMAT("%T",(SELECT * FROM UNNEST(twi.ca) as x WHERE x BETWEEN twi.TIME_INTERVAL_STARTS AND twi.TIME_INTERVAL_FINISH))) AS ARRAY_OF_TIMES_IN_INTERVALL
FROM TIMES_WITH_INTERVALS twi
group by T_PLUS, START_TIME_ROUNDED_UP, TIME_INTERVAL_STARTS, TIME_INTERVAL_FINISH

THE BIGGER PICTURE更大的图景

I have a table (picture 2 at the end) that has a datetime stamp + volume measurement.我有一张带有日期时间戳+体积测量的表格(最后的图2)。 I want to aggregate the values that was measured into rounded time interval, that means being able to do arithmetic with the measurements of each event.我想将测量的值聚合到舍入的时间间隔中,这意味着能够对每个事件的测量值进行算术运算。

I'm open for different approach.我对不同的方法持开放态度。

The calculation I want to do — sum of:我想做的计算——总和:

  1. For the first measure-event in the time interval: check what was the time difference to datetime stamp of the former measurement in the original table (in minutes), divide the value in the time difference, multiply it by the time difference from the beginning of the interval.对于时间间隔中的第一个测量事件:检查原始表中前一次测量的日期时间戳的时间差是多少(以分钟为单位),除以时间差中的值,将其乘以从一开始的时间差的区间。
  2. All the other measure-events in the time interval: simply add them to the sum.时间间隔内的所有其他测量事件:只需将它们添加到总和中。
  3. First measurement after the interval: same logic of the first measurement, but adding to the sum the complement part.间隔后的第一次测量:与第一次测量的逻辑相同,但将补码部分添加到总和中。

ULTIMATE SOLUTION FOR ME我的终极解决方案

Thanks to the great help of Gordon Linoff, I was able to run my code properly.感谢 Gordon Linoff 的大力帮助,我能够正确运行我的代码。 When I continued working on the "bigger pictures" I needed a solution with more variability for the unnested arra — ended up combining Gordon Linoff solution with CASEs:当我继续处理“更大的图片”时,我需要一个对未嵌套的 arra 具有更多可变性的解决方案——最终将 Gordon Linoff 解决方案与 CASE 结合起来:

SELECT *,
      COUNT(case when x BETWEEN twi.TIME_INTERVAL_STARTS AND twi.TIME_INTERVAL_FINISH then uo.charttime end) AS NUMBER_OF_OUTPUTS_IN_INTERVAL,
      ARRAY_AGG(case when x BETWEEN twi.TIME_INTERVAL_STARTS AND twi.TIME_INTERVAL_FINISH then uo.charttime end IGNORE NULLS) AS ARRAY_OF_TIMES_IN_INTERVALL,
      ARRAY_AGG(case when x BETWEEN twi.TIME_INTERVAL_STARTS AND twi.TIME_INTERVAL_FINISH then uo.value end IGNORE NULLS) AS ARRAY_OF_UO,
      ARRAY_REVERSE(ARRAY_AGG(case when x <= twi.TIME_INTERVAL_STARTS then x end IGNORE NULLS))[OFFSET(0)] AS TIME_BEFORE,
      ARRAY_AGG(case when x > twi.TIME_INTERVAL_FINISH then x end IGNORE NULLS)[OFFSET(0)] AS TIME_AFTER,
      ARRAY_AGG(case when x > twi.TIME_INTERVAL_FINISH then uo.value end IGNORE NULLS)[OFFSET(0)] AS UO_AFTER,
FROM TIMES_WITH_INTERVALS twi 
      LEFT JOIN UNNEST(twi.ca) x
      ON true
      LEFT JOIN uo
      ON uo.charttime = x
GROUP BY T_PLUS, START_TIME_ROUNDED_UP, TIME_INTERVAL_STARTS, TIME_INTERVAL_FINISH
ORDER BY T_PLUS

screenshots of original tables:原表截图:

  1. time intervals时间间隔
  2. datetime events日期时间事件

Put the format() and array_agg() in the subquery:format()array_agg()放在子查询中:

(SELECT ARRAY_AGG(FORMAT('%T', x))
 FROM 
 WHERE x BETWEEN twi.TIME_INTERVAL_STARTS AND twi.TIME_INTERVAL_FINISH
) AS ARRAY_OF_TIMES_IN_INTERVAL

That said, it is quite unclear to me why you are turning the timestamp into a string to store in an array.也就是说,我很不清楚为什么要将时间戳转换为字符串以存储在数组中。 You can just have an array of the native type.您可以只拥有一个本机类型的数组。

EDIT:编辑:

You query should probably look like this:您的查询应该如下所示:

SELECT T_PLUS, START_TIME_ROUNDED_UP, TIME_INTERVAL_STARTS, TIME_INTERVAL_FINISH,
       COUNT(x) as cnt,
       ARRAY_AGG(x) as timestamps
FROM TIMES_WITH_INTERVALS twi LEFT JOIN
     UNNEST(twi.ca) x
     ON x BETWEEN twi.TIME_INTERVAL_STARTS AND twi.TIME_INTERVAL_FINISH
GROUP BY T_PLUS, START_TIME_ROUNDED_UP, TIME_INTERVAL_STARTS, TIME_INTERVAL_FINISH;

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM