简体   繁体   English

Bigquery:使用 Group By 条件选择前 3 名

[英]Bigquery: Select top 3 with Group By condition

I have a table like this我有一张这样的桌子

type    Total
 A       100
 A       123
 A       154
 A       50
 A       54
 B       200
 B       166
 B       423
 B       342
 B       213
 C       520
 C       130
 C       234
 C       512

I want to select the top 3 total by groups.我想按组选择前 3 名。 How can i do it?我该怎么做?

Row number is fine.行号没问题。 A fun way to do this in BigQuery is:在 BigQuery 中执行此操作的一种有趣方法是:

select type,
       array_agg(total order by total desc limit 3) as top3
from t
group by type;

This puts the values into an array.这会将值放入数组中。

In most [big data] use cases using ROW_NUMBER() is not fine as it ends up with resource exceeded error.在大多数[大数据]用例中,使用 ROW_NUMBER() 并不好,因为它最终会出现资源超出错误。 This is because it requires all point of same group be present in same/one node which in case of data skew leads to above mentioned error in BigQuery这是因为它要求同一组的所有点都存在于同一/一个节点中,如果数据倾斜会导致 BigQuery 中出现上述错误

Option 1选项1

One of the usual ways to address this issue is using ARRAY_AGG() function as it is in below below example解决此问题的常用方法之一是使用 ARRAY_AGG() 函数,如下例所示

#standardSQL
SELECT type, total FROM (
  SELECT type, ARRAY_AGG(total ORDER BY total DESC LIMIT 3) arr
  FROM `project.dataset.table` GROUP BY type
), UNNEST(arr) total

If to run above against data example from your question如果要针对您问题中的数据示例运行以上

#standardSQL
WITH `project.dataset.table` AS (
  SELECT 'A' type, 100 total UNION ALL
  SELECT 'A', 123 UNION ALL
  SELECT 'A', 154 UNION ALL
  SELECT 'A', 50 UNION ALL
  SELECT 'A', 54 UNION ALL
  SELECT 'B', 200 UNION ALL
  SELECT 'B', 166 UNION ALL
  SELECT 'B', 423 UNION ALL
  SELECT 'B', 342 UNION ALL
  SELECT 'B', 213 UNION ALL
  SELECT 'C', 520 UNION ALL
  SELECT 'C', 130 UNION ALL
  SELECT 'C', 234 UNION ALL
  SELECT 'C', 512 
)
SELECT type, total FROM (
  SELECT type, ARRAY_AGG(total ORDER BY total DESC LIMIT 3) arr
  FROM `project.dataset.table` GROUP BY type
), UNNEST(arr) total
-- ORDER BY type   

you will get expected result as你会得到预期的结果

Row type    total    
1   A       154  
2   A       123  
3   A       100  
4   B       423  
5   B       342  
6   B       213  
7   C       520  
8   C       512  
9   C       234    

Option 2选项 2

But there is yet another interesting option to consider for really big data - to use APPROX_TOP_SUM() function as in below example但是对于真正的大数据还有另一个有趣的选择需要考虑——使用 APPROX_TOP_SUM() 函数,如下例所示

#standardSQL
SELECT type, value AS total FROM (
  SELECT type, APPROX_TOP_SUM(total, total, 3) arr
  FROM `project.dataset.table` GROUP BY type
), UNNEST(arr)  

obviously, with the same output as above for sample data显然,样本数据的输出与上面相同

You can try using row_number()您可以尝试使用 row_number()

select * from
(
select type, total, row_number() over(partition by type order by total desc) as rn
from tablename
)A
where rn<=3

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM