![](/img/trans.png)
[英]Calculate mode of all column values for the same ID using BigQuery SQL
[英]How to calculate average and median of all the values of an array column in BigQuery?
我有一個帶有數組類型列的表data
。 對於表格的每一行,我想計算這些相應列values_*
的平均值和中位數。
示例表data
:
id values_1 values_2
a 2
4
b 10
4
16
c NULL
6
d NULL
NULL
樣品預期 output:
id avg_values_1 median_values_1 avg_values_2 median_values_2
a 3 3
b 7.5 10
c 6 6
d NULL NULL
以下是 BigQuery 標准 SQL
#standardSQL
WITH temp AS (
SELECT id, ARRAY(SELECT * FROM UNNEST(values_1) i WHERE NOT i IS NULL) AS values_1
FROM `project.dataset.table`
)
SELECT id,
(SELECT AVG(i) FROM UNNEST(values_1) AS i) AS avg_values_1,
(SELECT DISTINCT PERCENTILE_CONT(i, 0.5) OVER() AS median FROM UNNEST(values_1) AS i) AS median_values_1,
FROM temp
您可以使用您問題中的示例數據進行測試,使用上面的示例數據,如下例所示
#standardSQL
WITH `project.dataset.table` AS (
SELECT 'a' id, [2, 4] values_1 UNION ALL
SELECT 'b', [10, 4, 16] UNION ALL
SELECT 'c', [6, NULL] UNION ALL
SELECT 'd', [NULL, NULL]
), temp AS (
SELECT id, ARRAY(SELECT * FROM UNNEST(values_1) i WHERE NOT i IS NULL) AS values_1
FROM `project.dataset.table`
)
SELECT id,
(SELECT AVG(i) FROM UNNEST(values_1) AS i) AS avg_values_1,
(SELECT DISTINCT PERCENTILE_CONT(i, 0.5) OVER() AS median FROM UNNEST(values_1) AS i) AS median_values_1,
FROM temp
與 output
Row id avg_values_1 median_values_1
1 a 3.0 3.0
2 b 10.0 10.0
3 c 6.0 6.0
4 d null null
請注意,我必須首先引入 temp CTE 以從 arrays 中消除 NULL 元素
您可以根據需要/擁有的多列重復此構造
或者,如果您的列多於幾列 - 您可以使用https://stackoverflow.com/a/63105643/5221944中顯示的方法一次動態構建和執行所有列的查詢!
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.