简体   繁体   English

基于列之间的比较从两个表中聚合和连接

[英]Aggregate and concatenate from two tables based on comparison between columns

I have two tables like this:我有两个这样的表:

Table1表格1

A    T
a1   t1
a2   t2
a3   t3
a4   t4
a5   t5
...
...

Table2表2

E    T
e1   t1
e2   t2
e3   t3
e4   t4
e5   t5
...
...

what I wanted to achieve is this:我想要实现的是:

Table 3表3

E    A'
e1   a1,a2,a3
e2   a4,a5,a6
...
...

The aggregation A' is done like this: In table 2 for each e there is a value in column T : t and with that t you look for the last 3 values in Table 1 that are less than the t in question.聚合 A' 是这样完成的:在表 2 中,每个e在列T : t中都有一个值,并且通过该t ,您可以查找表 1 中小于相关t的最后 3 个值。 So a1, a2, a3 are values of A whose t values are less than t1 in Table 2 whose E is e1 .所以 a1, a2, a3 是A的值,其t值小于表 2 中的t1 ,其 E 为e1

I know that I could write two queries for this like this:我知道我可以这样写两个查询:

ResultSet (rt) -> select t from e结果集(rt)-> select t from e

and then iterate ResultSet and do something like this:然后迭代 ResultSet 并执行以下操作:

select A from Table1 where t < rt[i] limit 3 - not sure how to concatenate here:) select A from Table1 where t < rt[i] limit 3 - 不知道如何在这里连接:)

but I m pretty sure this is utterly inefficient.但我很确定这是完全低效的。 There should be a better way to do this.应该有更好的方法来做到这一点。

I m working with Postgresql.我正在使用 Postgresql。

If it had been a dataframe from a file I would use python's pandas.如果它是来自文件的 dataframe,我会使用 python 的 pandas。 Also I know that python has read_sql but the tables are very huge I don't want to load the whole table in memory which I think it won't but not sure either - anyway its a separate story.我也知道 python 有 read_sql 但表非常大我不想在 memory 中加载整个表,我认为它不会但也不确定 - 无论如何它是一个单独的故事。

How do we solve this in SQL?我们如何在 SQL 中解决这个问题? Any ideas please.请有任何想法。

In table 2 for each e there is a value in column T: t and with that t you look for the last 3 values in Table 1 that are less than the t in question.在表 2 中,每个 e 在列 T: t 中都有一个值,通过该 t,您可以查找表 1 中小于相关 t 的最后 3 个值。

I don't understand the results follow this logic.我不明白结果遵循这个逻辑。 But based on your description, you can use a lateral join:但根据您的描述,您可以使用横向连接:

select t2.*, t1.the_as
from t2 left join lateral
     (select array_agg(t1.a) as the_as
      from (select t1.*
            from t1
            where t1.T <= t2.T
            order by t1.T desc
            limit 3
           ) t1
     ) t1
     on 1=1;

Note that this uses arrays rather than strings because I think arrays are a better data structure for storing multiple values.请注意,这使用 arrays 而不是字符串,因为我认为 arrays 是存储多个值的更好数据结构。 That said, you can just use string_agg() instead, if you really want a string.也就是说,如果你真的想要一个字符串,你可以只使用string_agg() The syntax would be string_agg(t1.a, ',') .语法为string_agg(t1.a, ',')

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM