[英]How Can I Concatenate Different Rows In a Single Cell Using Row_number function Without Using Multiple Joins
我有 Id、event_name、Timestamp 列。
我將連接每個 id 的最后 5 個事件。
我已經使用多個 WITH 和 JOIN 來獲取它。 但是 BigQuery 花了很多時間來計算。
這也感覺像是一種糟糕的做法。 我可以用什么作為替代品?
我的桌子看起來像這樣
ID | 事件名稱 | 時間戳 |
---|---|---|
A1 | 一種 | 2022-10-21 12:10:00 協調世界時 |
A1 | b | 2022-10-21 12:12:00 協調世界時 |
A1 | c | 2022-10-21 12:15:00 協調世界時 |
A1 | d | 2022-10-21 12:16:00 協調世界時 |
A1 | 電子 | 2022-10-21 12:28:00 協調世界時 |
A1 | F | 2022-10-21 12:45:00 協調世界時 |
B2 | c | 2022-10-21 10:12:00 協調世界時 |
B2 | F | 2022-10-21 11:12:00 協調世界時 |
B2 | b | 2022-10-21 11:25:00 協調世界時 |
B2 | 一種 | 2022-10-21 11:26:00 協調世界時 |
B2 | F | 2022-10-21 15:32:00 協調世界時 |
B2 | c | 2022-10-21 15:32:48 協調世界時 |
B2 | F | 2022-10-21 15:36:00 UTC |
我的代碼看起來像這樣。
WITH a AS ( id, timestamp, event_name, row_number() over(partition by ID ORDER BY timestamp DESC) as row_n
FROM my_table),
WITH b AS (id, timestamp, event_name, row_n
FROM a
WHERE row_n <= 5),
e1 AS(
SELECT ID, timestamp AS ev1
FROM b
WHERE row_n = 1),
e2 AS(
SELECT ID, timestamp AS ev2
FROM b
WHERE row_n = 2),
e3 AS(
SELECT ID, timestamp AS ev3
FROM b
WHERE row_n = 3),
e4 AS(
SELECT ID, timestamp AS ev4
FROM b
WHERE row_n = 4),
e5 AS(
SELECT ID, timestamp AS ev5
FROM b
WHERE row_n = 5),
concat_prep AS(
SELECT b.ID, ev1,ev2,ev3,ev4,ev5
FROM b
LEFT JOIN e1
ON b.ID = e1.ID
LEFT JOIN e2
ON e1.ID = e2.ID
LEFT JOIN e3
ON e2.ID = e3.ID
LEFT JOIN e4
ON e3.ID = e4.ID
LEFT JOIN e5
ON e4.ID= e5.ID)
SELECT ID, concat(ev1,',',ev2,',',ev3,',',ev4,',',ev5) as concatt
FROM concat_prep
GROUP BY ID ,concat(ev1,',',ev2,',',ev3,',',ev4,',',ev5)
而我的 output 應該是這樣的:
ID | 連接 |
---|---|
A1 | f,e,d,c,b |
B2 | f,c,f,a,b |
我該如何優化它? (我已經按日期過濾了)這個查詢是一個更大查詢的一部分。
請在group by
中構建一個數組。 有一個選項可以限制元素。
With tbl as (select * , rand() as timestamp from unnest(["A1","B2"]) ID,unnest(split("a b c d e f g"," ")) event_name)
SELECT
ID,
#array_Agg(event_name order by timestamp limit 5),
string_Agg(event_name order by timestamp desc limit 5) as concat
from tbl
group by 1
然而,您可以使用ROW_NUMBER()
window function 而DENSE_RANK()
可能更可取,因為它會任意渲染,包括時間戳值的關系(相等)。
所以一個選項是使用:
WITH t AS
(
SELECT t.*, DENSE_RANK() OVER (PARTITION BY id ORDER BY timestamp DESC) AS dr
FROM my_table AS t
)
SELECT ID, STRING_AGG(event_name ORDER BY dr LIMIT 5) AS `concat`
FROM t
GROUP BY ID
通過使用STRING_AGG()串聯 function,默認情況下將生成逗號分隔的字符串(除非在表達式[本例中為列event_name ] 之后提供可選參數定界符)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.