简体   繁体   English

如何计算 BigQuery 中数组列的所有值的平均值和中位数?

[英]How to calculate average and median of all the values of an array column in BigQuery?

I have a table data with an array type columns.我有一个带有数组类型列的表data For each row of the table I want to calculate average and median of those respective column values_* .对于表格的每一行,我想计算这些相应列values_*的平均值和中位数。

Example Table data :示例表data

id   values_1   values_2
 a      2
        4
 b      10
        4
        16
 c     NULL
        6
 d     NULL
       NULL

Sample expected output:样品预期 output:

id   avg_values_1   median_values_1    avg_values_2   median_values_2
a         3              3 
b        7.5            10
c         6              6
d        NULL           NULL

Below is for BigQuery Standard SQL以下是 BigQuery 标准 SQL

#standardSQL
WITH temp AS (
  SELECT id, ARRAY(SELECT * FROM UNNEST(values_1) i WHERE NOT i IS NULL) AS values_1
  FROM `project.dataset.table`
)
SELECT id,  
  (SELECT AVG(i) FROM UNNEST(values_1) AS i) AS avg_values_1,
  (SELECT DISTINCT PERCENTILE_CONT(i, 0.5) OVER() AS median FROM UNNEST(values_1) AS i) AS median_values_1,
FROM temp   

You can test, play with above using sample data from your question as in below example您可以使用您问题中的示例数据进行测试,使用上面的示例数据,如下例所示

#standardSQL
WITH `project.dataset.table` AS (
  SELECT 'a' id, [2, 4] values_1 UNION ALL
  SELECT 'b', [10, 4, 16] UNION ALL
  SELECT 'c', [6, NULL] UNION ALL
  SELECT 'd', [NULL, NULL] 
), temp AS (
  SELECT id, ARRAY(SELECT * FROM UNNEST(values_1) i WHERE NOT i IS NULL) AS values_1
  FROM `project.dataset.table`
)
SELECT id, 
  (SELECT AVG(i) FROM UNNEST(values_1) AS i) AS avg_values_1,
  (SELECT DISTINCT PERCENTILE_CONT(i, 0.5) OVER() AS median FROM UNNEST(values_1) AS i) AS median_values_1,
FROM temp  

with output与 output

Row id  avg_values_1    median_values_1  
1   a   3.0             3.0  
2   b   10.0            10.0     
3   c   6.0             6.0  
4   d   null            null       

Note that I had to first introduce temp CTE to eliminate NULL elements from arrays请注意,我必须首先引入 temp CTE 以从 arrays 中消除 NULL 元素

You can repeat this construct for as many columns as you need/have您可以根据需要/拥有的多列重复此构造

Or, if you have more columns than just few - you can use approach shown in https://stackoverflow.com/a/63105643/5221944 to dynamically build and execute the query for all columns at once!或者,如果您的列多于几列 - 您可以使用https://stackoverflow.com/a/63105643/5221944中显示的方法一次动态构建和执行所有列的查询!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM