如何计算 Google BigQuery 中多列的中位数？

Question

I'm creating a query to calculate median visits from two different websites by day.我正在创建一个查询来计算每天来自两个不同网站的访问中位数。

The output should look like the following:输出应如下所示：

+------------+---------+---------------+
|    date    | website | median_visits |
+------------+---------+---------------+
| 2019-04-01 | A       | median_value  |
| 2019-04-01 | B       | median_value  |
| 2019-04-02 | A       | median_value  |
| 2019-04-02 | B       | median_value  |
| 2019-04-03 | A       | median_value  |
| 2019-04-03 | B       | median_value  |
+------------+---------+---------------+

Here is what my table (there are 20,000 rows) looks like:这是我的表（有 20,000 行）的样子：

+------------+---------+--------+
|    date    | website | visits |
+------------+---------+--------+
| 2019-04-01 | A       |   10.0 |
| 2019-04-01 | B       |   14.0 |
| 2019-04-02 | A       |   85.0 |
| 2019-04-03 | A       |   75.0 |
| 2019-04-02 | B       |    3.0 |
| 2019-04-02 | B       |   45.0 |
| 2019-04-01 | A       |   12.0 |
| 2019-04-03 | A       |   44.0 |
| 2019-04-01 | A       |   99.0 |
+------------+---------+--------+

What would be the most efficient way to query for the desired output?查询所需输出的最有效方法是什么？ I am currently using:我目前正在使用：

SELECT DISTINCT date, website, median_visits
FROM
 (SELECT  date, website, PERCENTILE_CONT(visits, 0.5) 
  OVER(PARTITION BY date, website) AS median_visits
  FROM table)

Answer 1

Below is for BigQuery Standard SQL - I cannot claim it is the best.下面是 BigQuery 标准 SQL - 我不能说它是最好的。 I cannot even guarantee that it is better - but based on my testing I see better execution plan and slots usage.我什至不能保证它更好 - 但根据我的测试，我看到更好的执行计划和插槽使用。 So, you can try and see with your data因此，您可以尝试查看您的数据

#standardSQL
SELECT date, website, 
  (SELECT PERCENTILE_CONT(visit, 0.5) OVER() 
    FROM UNNEST(visits) visit LIMIT 1
  ) AS median_visits
FROM (
  SELECT date, website, ARRAY_AGG(visits) visits
  FROM `project.dataset.table`
  GROUP BY date, website
)

如何计算 Google BigQuery 中多列的中位数？

问题描述

1 个解决方案

解决方案1
1 已采纳 2019-04-23 19:36:40

如何计算 Google BigQuery 中多列的中位数？

问题描述

1 个解决方案

解决方案1 1 已采纳 2019-04-23 19:36:40

解决方案1
1 已采纳 2019-04-23 19:36:40