简体   繁体   中英

BigQuery : Scalar Subquery produced more than one - aggregating datetime to array in time interval

I'm trying to find events that occurred in a specific time-interval (different interval per row), and add it as a column. The two tables attached at the end: (1) time intervals, (2) datetime events

Firstly, I added a column with all the datatime events as array for every row. example

Secondly, i used this code to count how many datetime in each interval:

-- count how many urine-output chart-events for every hourly interval
SELECT T_PLUS, START_TIME_ROUNDED_UP, TIME_INTERVAL_STARTS, TIME_INTERVAL_FINISH,
sum((SELECT count(*) FROM UNNEST(twi.ca) as x WHERE x BETWEEN twi.TIME_INTERVAL_STARTS AND twi.TIME_INTERVAL_FINISH)) NUMBER_OF_OUTPUTS_IN_INTERVAL
FROM TIMES_WITH_INTERVALS twi
group by T_PLUS, START_TIME_ROUNDED_UP, TIME_INTERVAL_STARTS, TIME_INTERVAL_FINISH

When I tried to add another column with the datetime that occur in the interval (in array) I get:

Scalar subquery produced more than one element

This is the code I used:

SELECT T_PLUS, START_TIME_ROUNDED_UP, TIME_INTERVAL_STARTS, TIME_INTERVAL_FINISH,
sum((SELECT count(*) FROM UNNEST(twi.ca) as x WHERE x BETWEEN twi.TIME_INTERVAL_STARTS AND twi.TIME_INTERVAL_FINISH)) NUMBER_OF_OUTPUTS_IN_INTERVAL,
ARRAY_AGG(FORMAT("%T",(SELECT * FROM UNNEST(twi.ca) as x WHERE x BETWEEN twi.TIME_INTERVAL_STARTS AND twi.TIME_INTERVAL_FINISH))) AS ARRAY_OF_TIMES_IN_INTERVALL
FROM TIMES_WITH_INTERVALS twi
group by T_PLUS, START_TIME_ROUNDED_UP, TIME_INTERVAL_STARTS, TIME_INTERVAL_FINISH

THE BIGGER PICTURE

I have a table (picture 2 at the end) that has a datetime stamp + volume measurement. I want to aggregate the values that was measured into rounded time interval, that means being able to do arithmetic with the measurements of each event.

I'm open for different approach.

The calculation I want to do — sum of:

  1. For the first measure-event in the time interval: check what was the time difference to datetime stamp of the former measurement in the original table (in minutes), divide the value in the time difference, multiply it by the time difference from the beginning of the interval.
  2. All the other measure-events in the time interval: simply add them to the sum.
  3. First measurement after the interval: same logic of the first measurement, but adding to the sum the complement part.

ULTIMATE SOLUTION FOR ME

Thanks to the great help of Gordon Linoff, I was able to run my code properly. When I continued working on the "bigger pictures" I needed a solution with more variability for the unnested arra — ended up combining Gordon Linoff solution with CASEs:

SELECT *,
      COUNT(case when x BETWEEN twi.TIME_INTERVAL_STARTS AND twi.TIME_INTERVAL_FINISH then uo.charttime end) AS NUMBER_OF_OUTPUTS_IN_INTERVAL,
      ARRAY_AGG(case when x BETWEEN twi.TIME_INTERVAL_STARTS AND twi.TIME_INTERVAL_FINISH then uo.charttime end IGNORE NULLS) AS ARRAY_OF_TIMES_IN_INTERVALL,
      ARRAY_AGG(case when x BETWEEN twi.TIME_INTERVAL_STARTS AND twi.TIME_INTERVAL_FINISH then uo.value end IGNORE NULLS) AS ARRAY_OF_UO,
      ARRAY_REVERSE(ARRAY_AGG(case when x <= twi.TIME_INTERVAL_STARTS then x end IGNORE NULLS))[OFFSET(0)] AS TIME_BEFORE,
      ARRAY_AGG(case when x > twi.TIME_INTERVAL_FINISH then x end IGNORE NULLS)[OFFSET(0)] AS TIME_AFTER,
      ARRAY_AGG(case when x > twi.TIME_INTERVAL_FINISH then uo.value end IGNORE NULLS)[OFFSET(0)] AS UO_AFTER,
FROM TIMES_WITH_INTERVALS twi 
      LEFT JOIN UNNEST(twi.ca) x
      ON true
      LEFT JOIN uo
      ON uo.charttime = x
GROUP BY T_PLUS, START_TIME_ROUNDED_UP, TIME_INTERVAL_STARTS, TIME_INTERVAL_FINISH
ORDER BY T_PLUS

screenshots of original tables:

  1. time intervals
  2. datetime events

Put the format() and array_agg() in the subquery:

(SELECT ARRAY_AGG(FORMAT('%T', x))
 FROM 
 WHERE x BETWEEN twi.TIME_INTERVAL_STARTS AND twi.TIME_INTERVAL_FINISH
) AS ARRAY_OF_TIMES_IN_INTERVAL

That said, it is quite unclear to me why you are turning the timestamp into a string to store in an array. You can just have an array of the native type.

EDIT:

You query should probably look like this:

SELECT T_PLUS, START_TIME_ROUNDED_UP, TIME_INTERVAL_STARTS, TIME_INTERVAL_FINISH,
       COUNT(x) as cnt,
       ARRAY_AGG(x) as timestamps
FROM TIMES_WITH_INTERVALS twi LEFT JOIN
     UNNEST(twi.ca) x
     ON x BETWEEN twi.TIME_INTERVAL_STARTS AND twi.TIME_INTERVAL_FINISH
GROUP BY T_PLUS, START_TIME_ROUNDED_UP, TIME_INTERVAL_STARTS, TIME_INTERVAL_FINISH;

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM