繁体   English   中英

如何计算 BigQuery 中数组列的所有值的平均值和中位数?

[英]How to calculate average and median of all the values of an array column in BigQuery?

我有一个带有数组类型列的表data 对于表格的每一行,我想计算这些相应列values_*的平均值和中位数。

示例表data

id   values_1   values_2
 a      2
        4
 b      10
        4
        16
 c     NULL
        6
 d     NULL
       NULL

样品预期 output:

id   avg_values_1   median_values_1    avg_values_2   median_values_2
a         3              3 
b        7.5            10
c         6              6
d        NULL           NULL

以下是 BigQuery 标准 SQL

#standardSQL
WITH temp AS (
  SELECT id, ARRAY(SELECT * FROM UNNEST(values_1) i WHERE NOT i IS NULL) AS values_1
  FROM `project.dataset.table`
)
SELECT id,  
  (SELECT AVG(i) FROM UNNEST(values_1) AS i) AS avg_values_1,
  (SELECT DISTINCT PERCENTILE_CONT(i, 0.5) OVER() AS median FROM UNNEST(values_1) AS i) AS median_values_1,
FROM temp   

您可以使用您问题中的示例数据进行测试,使用上面的示例数据,如下例所示

#standardSQL
WITH `project.dataset.table` AS (
  SELECT 'a' id, [2, 4] values_1 UNION ALL
  SELECT 'b', [10, 4, 16] UNION ALL
  SELECT 'c', [6, NULL] UNION ALL
  SELECT 'd', [NULL, NULL] 
), temp AS (
  SELECT id, ARRAY(SELECT * FROM UNNEST(values_1) i WHERE NOT i IS NULL) AS values_1
  FROM `project.dataset.table`
)
SELECT id, 
  (SELECT AVG(i) FROM UNNEST(values_1) AS i) AS avg_values_1,
  (SELECT DISTINCT PERCENTILE_CONT(i, 0.5) OVER() AS median FROM UNNEST(values_1) AS i) AS median_values_1,
FROM temp  

与 output

Row id  avg_values_1    median_values_1  
1   a   3.0             3.0  
2   b   10.0            10.0     
3   c   6.0             6.0  
4   d   null            null       

请注意,我必须首先引入 temp CTE 以从 arrays 中消除 NULL 元素

您可以根据需要/拥有的多列重复此构造

或者,如果您的列多于几列 - 您可以使用https://stackoverflow.com/a/63105643/5221944中显示的方法一次动态构建和执行所有列的查询!

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM