[英]How to write such query effectively in postgresql: aggregating consecutive rows into arrays(identified by a pair of columns)?
我有一个查询,如:
(
SELECT
t1.person_id,
t1.created_at,
't1' AS type,
t1.extra_data AS extra_data
FROM table1 AS t1
)
UNION
(
SELECT
t2.person_id,
t2.created_at,
't2' AS type,
t2.extra_data AS extra_data
FROM table2 AS t2
)
UNION
(
SELECT
t3.person_id,
t3.created_at,
't3' AS type,
t3.extra_data AS extra_data
FROM table3 AS t3
)
ORDER BY created_at DESC;
这将导致类似的结果( created_at
是一个时间戳,我省略了具体值,并使用简单的整数来表示顺序)
person_id | type | created_at | extra_data
--------- | ---- | ---------- | ----------
1 | t1 | 9 | a
1 | t1 | 8 | b
2 | t2 | 7 | c
2 | t2 | 6 | c
2 | t2 | 5 | d
1 | t3 | 4 | e
3 | t3 | 3 | f
我想将连续的(person_id,type)对分组,最大的created_at
作为最终的created_at,并将extra_data
聚合到一个数组中,即我想得到以下结果:
person_id | type | created_at | extra_data_array
--------- | ---- | ---------- | ----------
1 | t1 | 9 | [a, b]
2 | t2 | 7 | [c, c, d]
1 | t3 | 5 | e
3 | t3 | 4 | f
我已经尝试了窗口函数,但未能弄清楚如何实现。
我的问题是:
1)如何编写一个单查询来实现我的目标?
2)可以使用索引快速查询吗?
我对第二个问题的担心是,由于基本结果是从UNION查询中选择的,因此我怀疑是否有任何机会利用索引。
任何人,感谢各种帮助!
您需要的两个聚合函数是MAX
和array_agg
。
我认为,如果在UNION
之前先GROUP
,那么应用索引会更好,所以我会这样做:
(
SELECT
t1.person_id,
MAX(t1.created_at) AS created_at,
't1' AS type,
array_agg(t1.extra_data) AS extra_data
FROM table1 AS t1
GROUP BY t1.person_id
)
UNION
(
SELECT
t2.person_id,
MAX(t2.created_at) AS created_at,
't2' AS type,
array_agg(t2.extra_data) AS extra_data
FROM table2 AS t2
GROUP BY t2.person_id
)
UNION
(
SELECT
t3.person_id,
MAX(t3.created_at) AS created_at,
't3' AS type,
array_agg(t3.extra_data) AS extra_data
FROM table3 AS t3
GROUP BY t3.person_id
)
ORDER BY created_at DESC;
但是,您也可以先UNION
,然后再将结果GROUP
,只要您将GROUP BY person_id, type
。
还有一些注意事项:
这仍将对所有表进行全表扫描,因为您需要从每一行获取created_at
和extra_data
。 就像采访问题一样,“打印二叉树的节点的时间复杂度是多少?”
如果要排序数组,则可以使用array_agg(t1.extra_data ORDER BY t1.created_at)
或其他方法进行array_agg(t1.extra_data ORDER BY t1.created_at)
。 这是索引可以帮助您的地方。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.