簡體   English   中英

如何計算 BigQuery 中數組列的所有值的平均值和中位數?

[英]How to calculate average and median of all the values of an array column in BigQuery?

我有一個帶有數組類型列的表data 對於表格的每一行,我想計算這些相應列values_*的平均值和中位數。

示例表data

id   values_1   values_2
 a      2
        4
 b      10
        4
        16
 c     NULL
        6
 d     NULL
       NULL

樣品預期 output:

id   avg_values_1   median_values_1    avg_values_2   median_values_2
a         3              3 
b        7.5            10
c         6              6
d        NULL           NULL

以下是 BigQuery 標准 SQL

#standardSQL
WITH temp AS (
  SELECT id, ARRAY(SELECT * FROM UNNEST(values_1) i WHERE NOT i IS NULL) AS values_1
  FROM `project.dataset.table`
)
SELECT id,  
  (SELECT AVG(i) FROM UNNEST(values_1) AS i) AS avg_values_1,
  (SELECT DISTINCT PERCENTILE_CONT(i, 0.5) OVER() AS median FROM UNNEST(values_1) AS i) AS median_values_1,
FROM temp   

您可以使用您問題中的示例數據進行測試,使用上面的示例數據,如下例所示

#standardSQL
WITH `project.dataset.table` AS (
  SELECT 'a' id, [2, 4] values_1 UNION ALL
  SELECT 'b', [10, 4, 16] UNION ALL
  SELECT 'c', [6, NULL] UNION ALL
  SELECT 'd', [NULL, NULL] 
), temp AS (
  SELECT id, ARRAY(SELECT * FROM UNNEST(values_1) i WHERE NOT i IS NULL) AS values_1
  FROM `project.dataset.table`
)
SELECT id, 
  (SELECT AVG(i) FROM UNNEST(values_1) AS i) AS avg_values_1,
  (SELECT DISTINCT PERCENTILE_CONT(i, 0.5) OVER() AS median FROM UNNEST(values_1) AS i) AS median_values_1,
FROM temp  

與 output

Row id  avg_values_1    median_values_1  
1   a   3.0             3.0  
2   b   10.0            10.0     
3   c   6.0             6.0  
4   d   null            null       

請注意,我必須首先引入 temp CTE 以從 arrays 中消除 NULL 元素

您可以根據需要/擁有的多列重復此構造

或者,如果您的列多於幾列 - 您可以使用https://stackoverflow.com/a/63105643/5221944中顯示的方法一次動態構建和執行所有列的查詢!

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM