![](/img/trans.png)
[英]Calculate mode of all column values for the same ID using BigQuery SQL
[英]How to calculate average and median of all the values of an array column in BigQuery?
我有一个带有数组类型列的表data
。 对于表格的每一行,我想计算这些相应列values_*
的平均值和中位数。
示例表data
:
id values_1 values_2
a 2
4
b 10
4
16
c NULL
6
d NULL
NULL
样品预期 output:
id avg_values_1 median_values_1 avg_values_2 median_values_2
a 3 3
b 7.5 10
c 6 6
d NULL NULL
以下是 BigQuery 标准 SQL
#standardSQL
WITH temp AS (
SELECT id, ARRAY(SELECT * FROM UNNEST(values_1) i WHERE NOT i IS NULL) AS values_1
FROM `project.dataset.table`
)
SELECT id,
(SELECT AVG(i) FROM UNNEST(values_1) AS i) AS avg_values_1,
(SELECT DISTINCT PERCENTILE_CONT(i, 0.5) OVER() AS median FROM UNNEST(values_1) AS i) AS median_values_1,
FROM temp
您可以使用您问题中的示例数据进行测试,使用上面的示例数据,如下例所示
#standardSQL
WITH `project.dataset.table` AS (
SELECT 'a' id, [2, 4] values_1 UNION ALL
SELECT 'b', [10, 4, 16] UNION ALL
SELECT 'c', [6, NULL] UNION ALL
SELECT 'd', [NULL, NULL]
), temp AS (
SELECT id, ARRAY(SELECT * FROM UNNEST(values_1) i WHERE NOT i IS NULL) AS values_1
FROM `project.dataset.table`
)
SELECT id,
(SELECT AVG(i) FROM UNNEST(values_1) AS i) AS avg_values_1,
(SELECT DISTINCT PERCENTILE_CONT(i, 0.5) OVER() AS median FROM UNNEST(values_1) AS i) AS median_values_1,
FROM temp
与 output
Row id avg_values_1 median_values_1
1 a 3.0 3.0
2 b 10.0 10.0
3 c 6.0 6.0
4 d null null
请注意,我必须首先引入 temp CTE 以从 arrays 中消除 NULL 元素
您可以根据需要/拥有的多列重复此构造
或者,如果您的列多于几列 - 您可以使用https://stackoverflow.com/a/63105643/5221944中显示的方法一次动态构建和执行所有列的查询!
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.