繁体   English   中英

BigQuery 中具有不同元素的数组串联

[英]Array concatenation with distinct elements in BigQuery

假设在每一行我有一个id和两个 arrays array_1array_2看起来像下面

SELECT 'a' id, [1,2,3,4,5] array_1, [2,2,2,3,6] array_2 UNION ALL
SELECT 'b', [2,3,4,5,6], [7,7,8,6,9] UNION ALL
SELECT 'c', [], [1,4,5]

我想连接这两个 arrays 并且只保留新数组中的唯一元素。 我想要的 output 如下所示

+----+-----------+-----------+-----------------------------+
| id |  array_1  |  array_2  | concatenated_array_distinct |
+----+-----------+-----------+-----------------------------+
| a  | 1,2,3,4,5 | 2,2,2,3,6 |                 1,2,3,4,5,6 |
| b  | 2,3,4,5,6 | 7,7,8,6,9 |             2,3,4,5,6,7,8,9 |
| c  |           |     1,4,5 |                       1,4,5 |
+----+-----------+-----------+-----------------------------+

我试图使用array_concat function,但我找不到使用array_concat function 来保留不同元素的方法。

无论如何我可以获得所需的 output 吗?

您可以使用unnest()union distinct

with t as (
      select 'a' id, [1,2,3,4,5] array_1, [2,2,2,3,6] array_2 UNION ALL
      select 'b', [2,3,4,5,6], [7,7,8,6,9] UNION ALL
      select 'c', [], [1,4,5]
     )
select t.*,
       (select array_agg( e.el)
        from (select el
              from unnest(array_1) el
              union distinct 
              select el
              from unnest(array_2) el
             ) e 
       ) array_unique             
from t

以下是 BigQuery 标准 SQL

...我试图使用array_concat function,但我找不到使用array_concat function 来保留不同元素的方法。 ...

你在正确的轨道上:o)

#standardSQL
WITH `project.dataset.table` AS (
  SELECT 'a' id, [1,2,3,4,5] array_1, [2,2,2,3,6] array_2 UNION ALL
  SELECT 'b', [2,3,4,5,6], [7,7,8,6,9] UNION ALL
  SELECT 'c', [], [1,4,5]
)
SELECT *, 
  ARRAY(SELECT DISTINCT x 
    FROM UNNEST(ARRAY_CONCAT(array_1, array_2)) x 
    ORDER BY x
  ) concatenated_array_distinct
FROM `project.dataset.table`  

简单、可读和可维护的解决方案:

#Declare the function once
#standardSQL
CREATE TEMP FUNCTION dedup(val ANY TYPE) AS ((
  SELECT ARRAY_AGG(t)
  FROM (SELECT DISTINCT * FROM UNNEST(val) v) t
));

 with t as (
      select 'a' id, [1,2,3,4,5] array_1, [2,2,2,3,6] array_2 UNION ALL
      select 'b', [2,3,4,5,6], [7,7,8,6,9] UNION ALL
      select 'c', [], [1,4,5]
     )
 select t.*, 
        dedup(array_1 || array_2) array_unique 
 from t

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM