簡體   English   中英

如何計算 Google BigQuery 中多列的中位數?

[英]How to calculate median over multiple columns in Google BigQuery?

我正在創建一個查詢來計算每天來自兩個不同網站的訪問中位數。

輸出應如下所示:

+------------+---------+---------------+
|    date    | website | median_visits |
+------------+---------+---------------+
| 2019-04-01 | A       | median_value  |
| 2019-04-01 | B       | median_value  |
| 2019-04-02 | A       | median_value  |
| 2019-04-02 | B       | median_value  |
| 2019-04-03 | A       | median_value  |
| 2019-04-03 | B       | median_value  |
+------------+---------+---------------+

這是我的表(有 20,000 行)的樣子:

+------------+---------+--------+
|    date    | website | visits |
+------------+---------+--------+
| 2019-04-01 | A       |   10.0 |
| 2019-04-01 | B       |   14.0 |
| 2019-04-02 | A       |   85.0 |
| 2019-04-03 | A       |   75.0 |
| 2019-04-02 | B       |    3.0 |
| 2019-04-02 | B       |   45.0 |
| 2019-04-01 | A       |   12.0 |
| 2019-04-03 | A       |   44.0 |
| 2019-04-01 | A       |   99.0 |
+------------+---------+--------+

查詢所需輸出的最有效方法是什么? 我目前正在使用:

SELECT DISTINCT date, website, median_visits
FROM
 (SELECT  date, website, PERCENTILE_CONT(visits, 0.5) 
  OVER(PARTITION BY date, website) AS median_visits
  FROM table)

下面是 BigQuery 標准 SQL - 我不能說它是最好的。 我什至不能保證它更好 - 但根據我的測試,我看到更好的執行計划和插槽使用。 因此,您可以嘗試查看您的數據

#standardSQL
SELECT date, website, 
  (SELECT PERCENTILE_CONT(visit, 0.5) OVER() 
    FROM UNNEST(visits) visit LIMIT 1
  ) AS median_visits
FROM (
  SELECT date, website, ARRAY_AGG(visits) visits
  FROM `project.dataset.table`
  GROUP BY date, website
)

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM