![](/img/trans.png)
[英]Google BigQuery Standard SQL get weight summarize result by group
[英]LIMIT per group - Google BigQuery/Standard SQL
我有一個類似以下的表格( 此處為示例):
CREATE TABLE topics (
name varchar(64),
url varchar(253),
statistic integer,
pubdate timestamp
);
INSERT INTO topics VALUES
('a', 'b', 100, TIMESTAMP '2011-05-16 15:36:38'),
('a', 'c', 110, TIMESTAMP '2014-04-01 00:00:00'),
('a', 'd', 120, TIMESTAMP '2014-04-01 00:00:00'),
('a', 'e', 90, TIMESTAMP '2011-05-16 15:36:38'),
('a', 'f', 80, TIMESTAMP '2014-04-01 00:00:00'),
('a', 'g', 70, TIMESTAMP '2011-05-16 15:36:38'),
('a', 'h', 150, TIMESTAMP '2014-04-01 00:00:00'),
('a', 'i', 50, TIMESTAMP '2011-05-16 15:36:38'),
('b', 'j', 10, TIMESTAMP '2014-04-01 00:00:00'),
('b', 'k', 11, TIMESTAMP '2011-05-16 15:36:38'),
('b', 'l', 12, TIMESTAMP '2014-04-01 00:00:00'),
('b', 'm', 9, TIMESTAMP '2011-05-16 15:36:38'),
('b', 'n', 8, TIMESTAMP '2014-04-01 00:00:00'),
('b', 'o', 7, TIMESTAMP '2011-05-16 15:36:38'),
('b', 'p', 15, TIMESTAMP '2014-04-01 00:00:00'),
('b', 'q', 5, TIMESTAMP '2011-05-16 15:36:38'),
('b', 'r', 2, TIMESTAMP '2014-04-01 00:00:00')
我想根據它們的statistic
值_from每個( name, date(pubdate)
)組合采用最上面的兩行。
換句話說,我想使用GROUP BY name, date(pubdate)
,但不使用聚合函數,而是根據每個組的statistic
簡單地獲取前兩行。 (因此,我知道這實際上不是GROUP BY
,而是“ greatest-n-per-group
。)
我正在將Google Big Query與標准SQL配合使用。 我研究了許多其他解決方案,但不確定在這種情況下如何實現結果。
所需結果:
name url statistic date
a b 100 2011-05-16
a e 90 2011-05-16
a h 150 2014-04-01
a d 120 2014-04-01
b m 9 2011-05-16
b k 11 2011-05-16
b l 12 2014-04-01
b p 15 2014-04-01
使用ARRAY_AGG
函數:
SELECT
name,
DATE(pubdate) AS pubdate,
ARRAY_AGG(STRUCT(url, statistic) ORDER BY statistic DESC LIMIT 2) AS top_urls
FROM dataset.table
GROUP BY name, pubdate
您可以將子查詢與UNNEST
一起使用,以獲取不帶數組的行作為輸出:
SELECT name, pubdate, url, statistic
FROM (
SELECT
name,
DATE(pubdate) AS pubdate,
ARRAY_AGG(STRUCT(url, statistic) ORDER BY statistic DESC LIMIT 2) AS top_urls
FROM dataset.table
GROUP BY name, pubdate
), UNNEST(top_urls)
以下是BigQuery標准SQL
#standardSQL
SELECT * EXCEPT(arr) FROM (
SELECT name, DATE(pubdate) day,
ARRAY_AGG(STRUCT(url, statistic) ORDER BY statistic DESC LIMIT 2) arr
FROM `project.dataset.table`
GROUP BY name, day
), UNNEST(arr)
-- ORDER BY name, day
您可以使用問題中的示例數據來測試,玩游戲,如以下示例所示
#standardSQL
WITH `project.dataset.table` AS (
SELECT 'a' name, 'b' url, 100 statistic, TIMESTAMP '2011-05-16 15:36:38' pubdate UNION ALL
SELECT 'a', 'c', 110, '2014-04-01 00:00:00' UNION ALL
SELECT 'a', 'd', 120, '2014-04-01 00:00:00' UNION ALL
SELECT 'a', 'e', 90, '2011-05-16 15:36:38' UNION ALL
SELECT 'a', 'f', 80, '2014-04-01 00:00:00' UNION ALL
SELECT 'a', 'g', 70, '2011-05-16 15:36:38' UNION ALL
SELECT 'a', 'h', 150, '2014-04-01 00:00:00' UNION ALL
SELECT 'a', 'i', 50, '2011-05-16 15:36:38' UNION ALL
SELECT 'b', 'j', 10, '2014-04-01 00:00:00' UNION ALL
SELECT 'b', 'k', 11, '2011-05-16 15:36:38' UNION ALL
SELECT 'b', 'l', 12, '2014-04-01 00:00:00' UNION ALL
SELECT 'b', 'm', 9, '2011-05-16 15:36:38' UNION ALL
SELECT 'b', 'n', 8, '2014-04-01 00:00:00' UNION ALL
SELECT 'b', 'o', 7, '2011-05-16 15:36:38' UNION ALL
SELECT 'b', 'p', 15, '2014-04-01 00:00:00' UNION ALL
SELECT 'b', 'q', 5, '2011-05-16 15:36:38' UNION ALL
SELECT 'b', 'r', 2, '2014-04-01 00:00:00'
)
SELECT * EXCEPT(arr) FROM (
SELECT name, DATE(pubdate) day,
ARRAY_AGG(STRUCT(url, statistic) ORDER BY statistic DESC LIMIT 2) arr
FROM `project.dataset.table`
GROUP BY name, day
), UNNEST(arr)
ORDER BY name, day
結果
Row name day url statistic
1 a 2011-05-16 b 100
2 a 2011-05-16 e 90
3 a 2014-04-01 h 150
4 a 2014-04-01 d 120
5 b 2011-05-16 k 11
6 b 2011-05-16 m 9
7 b 2014-04-01 p 15
8 b 2014-04-01 l 12
with xx as(
select name, url, statistic, pubdate, row_number() over(partition by name , url order by statistic desc) rn
from topics)
select * except(rn)
from xx
where rn <= 2;
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.