简体   繁体   English

如何在 BigQuery SQL 中添加 arrays?

[英]How do I add arrays in BigQuery SQL?

I have a UDF which returns a floating point array of the same size for each row of a table.我有一个 UDF,它为表的每一行返回一个相同大小的浮点数组。 How do I sum values of these arrays?如何求和这些 arrays 的值?

In other words, how can I do something like this:换句话说,我该怎么做:

create temp function f(...)
returns array<float64>
...;
select sum(f(column)) from table

As the result of this operation I need to get another array of equal size where作为此操作的结果,我需要获得另一个大小相等的数组,其中

result[i] = sum(over rows) f(row, column)[i]

So based on your comment, what you are looking for is the sum the values of all your arrays. 因此,根据您的评论,您要查找的是所有数组的值之sum This is how you can do it using UNNEST operator 这是使用UNNEST运算符的方法

WITH mydata  AS (
  SELECT [1.4, 1.3, 1.4, 1.1] as myarray
  union all 
  SELECT [1.4, 1.3, 1.4, 1.1] as myarray
  union all 
  SELECT [1.4, 1.3, 1.4, 1.1] as myarray
)

SELECT SUM(eachelement) from mydata, UNNEST(myarray) AS eachelement; 

Here is a function that uses ANY TYPE in order to support summing arrays of FLOAT64 , INT64 , or NUMERIC along with some sample input: 这是一个使用ANY TYPE的函数,以支持对FLOAT64INT64NUMERIC数组以及一些示例输入进行求和:

CREATE TEMP FUNCTION ElementWiseSum(arr1 ANY TYPE, arr2 ANY TYPE) AS (
  ARRAY(SELECT x + arr2[OFFSET(off)] FROM UNNEST(arr1) AS x WITH OFFSET off ORDER BY off)
);

SELECT arr1, arr2, ElementWiseSum(arr1, arr2) AS result
FROM (
  SELECT [1, 2, 3] AS arr1, [4, 5, 6] AS arr2 UNION ALL
  SELECT [7, 8], [9, 10] UNION ALL
  SELECT [], [] UNION ALL
  SELECT [11, 12, 13, 14, 15], [16, 17, 18, 19, 20]
);

It unnests arr1 using WITH OFFSET , then retrieves the equivalent element from arr2 using this offset, and orders by the offset to ensure that the element order is preserved. 它使用WITH OFFSET取消嵌套arr1 ,然后使用此偏移量从arr2检索等效元素,并按该偏移量排序以确保保留元素顺序。

Edit: to sum across rows, you can unnest the arrays, compute sums grouped by the offset of the elements, then reaggregate the sums into a new array: 编辑:要对各行求和,可以对数组进行嵌套,计算按元素偏移量分组的总和,然后将总和重新聚集到新数组中:

SELECT
  ARRAY_AGG(sum ORDER BY off) AS arr
FROM (
  SELECT
    off,
    SUM(x) AS sum
  FROM (
    SELECT [1, 2, 3] AS arr UNION ALL
    SELECT [7, 8, 9] UNION ALL
    SELECT [4, 5, 6] UNION ALL
    SELECT [10, 11, 12]
  ), UNNEST(arr) AS x WITH OFFSET off
  GROUP BY off
);

If you have your UDF defined (takes in a your column(s) and returns a float64 array of a pre-determined (or fixed) dimensions), you can use a simplified solution. 如果定义了UDF(在您的列中输入并返回预定(或固定)尺寸的float64数组),则可以使用简化的解决方案。 For example in case of 3-d arrays, something like: 例如,对于3维数组,类似:

create temp function f(...)
returns array<float64>
...;

with dataset as (
  select arr[offset(0)] as col_a, arr[offset(1)] as col_b, arr[offset(2)] as col_c
    from (
       select f(mycolumn) as arr
       from `mydataset.mytable`
    )
)

select [sum(col_a), sum(col_b), sum(col_c)] as new_array from dataset

This does not directly answer OP's question, but people landing on this page searching for "How do I add arrays in BigQuery SQL?"这并没有直接回答 OP 的问题,但是人们登陆此页面搜索“如何在 BigQuery SQL 中添加 arrays?” might benefit.可能会受益。

(Based on @elliott-brossard answer edit) In case you have 2 arrays, but 1 array includes a struct, you can use the following code to add them together: (基于@elliott-brossard 答案编辑)如果您有 2 个 arrays,但 1 个数组包含一个结构,您可以使用以下代码将它们加在一起:

WITH mydata AS (
  SELECT
    [1, 2, 3] AS arr
    -- ,[7, 8, 9] AS arr2
    ,[
      STRUCT(7 AS timeOnSite)
      ,STRUCT(8 AS timeOnSite)
      ,STRUCT(9 AS timeOnSite)
    ] AS arr2
)

SELECT
  (
    SELECT
      ARRAY_AGG(sum ORDER BY off) AS arr
    FROM (
      SELECT
        off,
        SUM(x) AS sum
      FROM (
        SELECT arr UNION ALL
        -- SELECT arr2
        SELECT (SELECT ARRAY_AGG(t.timeOnSite) FROM UNNEST(arr2) AS t)
      ), UNNEST(arr) AS x WITH OFFSET off
      GROUP BY off
    )  
  ) AS sum_arrays
FROM 
  mydata

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM