[英]How to write such query effectively in postgresql: aggregating consecutive rows into arrays(identified by a pair of columns)?
我有一個查詢,如:
(
SELECT
t1.person_id,
t1.created_at,
't1' AS type,
t1.extra_data AS extra_data
FROM table1 AS t1
)
UNION
(
SELECT
t2.person_id,
t2.created_at,
't2' AS type,
t2.extra_data AS extra_data
FROM table2 AS t2
)
UNION
(
SELECT
t3.person_id,
t3.created_at,
't3' AS type,
t3.extra_data AS extra_data
FROM table3 AS t3
)
ORDER BY created_at DESC;
這將導致類似的結果( created_at
是一個時間戳,我省略了具體值,並使用簡單的整數來表示順序)
person_id | type | created_at | extra_data
--------- | ---- | ---------- | ----------
1 | t1 | 9 | a
1 | t1 | 8 | b
2 | t2 | 7 | c
2 | t2 | 6 | c
2 | t2 | 5 | d
1 | t3 | 4 | e
3 | t3 | 3 | f
我想將連續的(person_id,type)對分組,最大的created_at
作為最終的created_at,並將extra_data
聚合到一個數組中,即我想得到以下結果:
person_id | type | created_at | extra_data_array
--------- | ---- | ---------- | ----------
1 | t1 | 9 | [a, b]
2 | t2 | 7 | [c, c, d]
1 | t3 | 5 | e
3 | t3 | 4 | f
我已經嘗試了窗口函數,但未能弄清楚如何實現。
我的問題是:
1)如何編寫一個單查詢來實現我的目標?
2)可以使用索引快速查詢嗎?
我對第二個問題的擔心是,由於基本結果是從UNION查詢中選擇的,因此我懷疑是否有任何機會利用索引。
任何人,感謝各種幫助!
您需要的兩個聚合函數是MAX
和array_agg
。
我認為,如果在UNION
之前先GROUP
,那么應用索引會更好,所以我會這樣做:
(
SELECT
t1.person_id,
MAX(t1.created_at) AS created_at,
't1' AS type,
array_agg(t1.extra_data) AS extra_data
FROM table1 AS t1
GROUP BY t1.person_id
)
UNION
(
SELECT
t2.person_id,
MAX(t2.created_at) AS created_at,
't2' AS type,
array_agg(t2.extra_data) AS extra_data
FROM table2 AS t2
GROUP BY t2.person_id
)
UNION
(
SELECT
t3.person_id,
MAX(t3.created_at) AS created_at,
't3' AS type,
array_agg(t3.extra_data) AS extra_data
FROM table3 AS t3
GROUP BY t3.person_id
)
ORDER BY created_at DESC;
但是,您也可以先UNION
,然后再將結果GROUP
,只要您將GROUP BY person_id, type
。
還有一些注意事項:
這仍將對所有表進行全表掃描,因為您需要從每一行獲取created_at
和extra_data
。 就像采訪問題一樣,“打印二叉樹的節點的時間復雜度是多少?”
如果要排序數組,則可以使用array_agg(t1.extra_data ORDER BY t1.created_at)
或其他方法進行array_agg(t1.extra_data ORDER BY t1.created_at)
。 這是索引可以幫助您的地方。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.