简体   繁体   中英

How Can I Concatenate Different Rows In a Single Cell Using Row_number function Without Using Multiple Joins

I have Id, event_name, Timestamp columns.

I will concatenate last 5 event of every id.

I've used multiple WITH and JOIN to acquaire this. But BigQuery took so much time to compute.

Also this feels like such a bad practice. What can i use as an alternative?

My table looks something like this

ID event_name timestamp
A1 a 2022-10-21 12:10:00 UTC
A1 b 2022-10-21 12:12:00 UTC
A1 c 2022-10-21 12:15:00 UTC
A1 d 2022-10-21 12:16:00 UTC
A1 e 2022-10-21 12:28:00 UTC
A1 f 2022-10-21 12:45:00 UTC
B2 c 2022-10-21 10:12:00 UTC
B2 f 2022-10-21 11:12:00 UTC
B2 b 2022-10-21 11:25:00 UTC
B2 a 2022-10-21 11:26:00 UTC
B2 f 2022-10-21 15:32:00 UTC
B2 c 2022-10-21 15:32:48 UTC
B2 f 2022-10-21 15:36:00 UTC

My code looks like this.

WITH a AS ( id, timestamp, event_name, row_number() over(partition by ID ORDER BY timestamp DESC) as row_n

FROM my_table),

WITH b AS (id, timestamp, event_name, row_n

FROM a
WHERE row_n <= 5),

e1 AS(
    SELECT ID, timestamp AS ev1
    FROM b
    WHERE row_n = 1),
    e2 AS(
    SELECT ID, timestamp AS ev2
    FROM b
    WHERE row_n = 2),
    e3 AS(
    SELECT ID, timestamp AS ev3
    FROM b
    WHERE row_n = 3),
    e4 AS(
    SELECT ID,  timestamp AS ev4
    FROM b
    WHERE row_n = 4),
    e5 AS(
    SELECT ID, timestamp AS ev5
    FROM b
    WHERE row_n = 5),

concat_prep AS(
  SELECT b.ID, ev1,ev2,ev3,ev4,ev5
  FROM b
  LEFT JOIN e1 
  ON b.ID = e1.ID
  LEFT JOIN e2
  ON e1.ID = e2.ID
  LEFT JOIN e3
  ON e2.ID = e3.ID
  LEFT JOIN e4
  ON e3.ID = e4.ID
  LEFT JOIN e5
  ON e4.ID= e5.ID)


  SELECT ID, concat(ev1,',',ev2,',',ev3,',',ev4,',',ev5) as concatt
  FROM concat_prep
  GROUP BY ID ,concat(ev1,',',ev2,',',ev3,',',ev4,',',ev5)

And my output should look like this:

ID concat
A1 f,e,d,c,b
B2 f,c,f,a,b

How can I optimize it? (I've already filtered by date) this query is part of a bigger query.

Please build an array in a group by . There is an option to limit the elements.

With tbl as (select * , rand() as timestamp from unnest(["A1","B2"]) ID,unnest(split("a b c d e f g"," ")) event_name)

SELECT
ID,
#array_Agg(event_name order by  timestamp limit 5),
string_Agg(event_name order by  timestamp desc limit 5) as concat

from tbl
group by 1

Yet, you can use ROW_NUMBER() window function while DENSE_RANK() might rather be preferable as it arbitrarily would render including the ties(equality) of timestamp values.

So an option would be using:

WITH t AS
(
  SELECT t.*, DENSE_RANK() OVER (PARTITION BY id ORDER BY timestamp DESC) AS dr
    FROM my_table AS t
)
SELECT ID, STRING_AGG(event_name ORDER BY dr LIMIT 5) AS `concat`
  FROM t
 GROUP BY ID

where a comma separated string would be generated by default(unless and optional argument delimiter provided after expression [the column event_name in this case] ) by using STRING_AGG() concatenation function

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM