繁体   English   中英

如何在PostgreSQL中有效地编写这样的查询:将连续的行聚合为数组(由一对列标识)?

[英]How to write such query effectively in postgresql: aggregating consecutive rows into arrays(identified by a pair of columns)?

我有一个查询,如:

(
    SELECT
        t1.person_id,
        t1.created_at,
        't1' AS type,
        t1.extra_data AS extra_data
    FROM table1 AS t1
)
UNION
(
    SELECT
        t2.person_id,
        t2.created_at,
        't2' AS type,
        t2.extra_data AS extra_data
    FROM table2 AS t2
)
UNION
(
    SELECT
        t3.person_id,
        t3.created_at,
        't3' AS type,
        t3.extra_data AS extra_data
    FROM table3 AS t3
)
ORDER BY created_at DESC;

这将导致类似的结果( created_at是一个时间戳,我省略了具体值,并使用简单的整数来表示顺序)

person_id  | type  |  created_at | extra_data
---------  | ----  |  ---------- | ----------
1          | t1    |  9          | a
1          | t1    |  8          | b
2          | t2    |  7          | c
2          | t2    |  6          | c
2          | t2    |  5          | d
1          | t3    |  4          | e
3          | t3    |  3          | f

我想将连续的(person_id,type)对分组,最大的created_at作为最终的created_at,并将extra_data聚合到一个数组中,即我想得到以下结果:

person_id  | type  |  created_at | extra_data_array
---------  | ----  |  ---------- | ----------
1          | t1    |  9          | [a, b]
2          | t2    |  7          | [c, c, d]
1          | t3    |  5          | e
3          | t3    |  4          | f

我已经尝试了窗口函数,但未能弄清楚如何实现。

我的问题是:

1)如何编写一个单查询来实现我的目标?

2)可以使用索引快速查询吗?

我对第二个问题的担心是,由于基本结果是从UNION查询中选择的,因此我怀疑是否有任何机会利用索引。

任何人,感谢各种帮助!

您需要的两个聚合函数是MAXarray_agg

我认为,如果在UNION之前先GROUP ,那么应用索引会更好,所以我会这样做:

(
    SELECT
        t1.person_id,
        MAX(t1.created_at) AS created_at,
        't1' AS type,
        array_agg(t1.extra_data) AS extra_data
    FROM table1 AS t1
    GROUP BY t1.person_id
)
UNION
(
    SELECT
        t2.person_id,
        MAX(t2.created_at) AS created_at,
        't2' AS type,
        array_agg(t2.extra_data) AS extra_data
    FROM table2 AS t2
    GROUP BY t2.person_id
)
UNION
(
    SELECT
        t3.person_id,
        MAX(t3.created_at) AS created_at,
        't3' AS type,
        array_agg(t3.extra_data) AS extra_data
    FROM table3 AS t3
    GROUP BY t3.person_id
)
ORDER BY created_at DESC;

但是,您也可以先UNION ,然后再将结果GROUP ,只要您将GROUP BY person_id, type

还有一些注意事项:

  • 这仍将对所有表进行全表扫描,因为您需要从每一行获取created_atextra_data 就像采访问题一样,“打印二叉树的节点的时间复杂度是多少?”

  • 如果要排序数组,则可以使用array_agg(t1.extra_data ORDER BY t1.created_at)或其他方法进行array_agg(t1.extra_data ORDER BY t1.created_at) 这是索引可以帮助您的地方。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM