如何从三个单独的表中构建一个事件表，以显示随时间的增量变化？

Question

I'm trying to build a dataset that shows incremental change over time for some product attributes.我正在尝试构建一个数据集，以显示某些产品属性随时间的增量变化。 The data is in AWS Athena in three separate tables that each store different attributes and they can be updated independently at different times.数据位于 AWS Athena 中的三个独立表中，每个表存储不同的属性，并且可以在不同时间独立更新。 tbl1 can be joined to tbl2 and tbl2 can be joined to tbl3 . tbl1可以连接到tbl2并且tbl2可以连接到tbl3 。 There is always a one-to-one relationship between the tables so tbl1.id=1 will only ever relate to tbl2.id=2 and tbl2.id=2 will only relate to tbl3.id=3 in this example:表之间始终存在一对一的关系，因此在此示例中tbl1.id=1只会与tbl2.id=2相关，而tbl2.id=2只会与tbl3.id=3相关：

tbl1
| id | updated_at       | bool  |
| 1  | 2019-09-10 06:00 | True  |
| 1  | 2020-08-05 10:00 | False |
| 1  | 2020-09-03 15:00 | True  |

tbl2
| id | tbl1_id | updated_at       | desc    |
| 2  | 1       | 2019-09-10 06:00 | thing 1 |

tbl3
| id | tbl2_id | updated_at       | value |
| 3  | 2       | 2019-09-10 06:00 | 100   |
| 3  | 2       | 2019-09-19 09:00 | 50    |
| 3  | 2       | 2019-12-02 11:00 | 20    |

I'm trying to write a query that joins this data into a single table and has a row for each incremental update.我正在尝试编写一个查询，将这些数据连接到一个表中，并且每个增量更新都有一行。 From the above tables there was the initial insert on 2019-09-10 then four other changes made across tbl1 and tbl3 so it should end up as five rows that look like:从上表中可以看出，在 2019 年 9 月 10 日进行了初始插入，然后在tbl1和tbl3中进行了其他四项更改，因此最终应为五行，如下所示：

| tbl1_id | tbl1_updated_at  | bool  | tbl2_id | tbl2_updated_at  | desc   | tbl3_id | tbl3_updated_at  | value |
| 1       | 2019-09-10 06:00 | True  | 2       | 2019-09-10 06:00 | thing1 | 3       | 2019-09-10 06:00 | 100   |
| 1       | 2019-09-10 06:00 | True  | 2       | 2019-09-10 06:00 | thing1 | 3       | 2019-09-19 09:00 | 50    |
| 1       | 2019-09-10 06:00 | True  | 2       | 2019-09-10 06:00 | thing1 | 3       | 2019-12-02 11:00 | 20    |
| 1       | 2020-08-05 10:00 | False | 2       | 2019-09-10 06:00 | thing1 | 3       | 2019-12-02 11:00 | 20    |
| 1       | 2020-09-03 15:00 | True  | 2       | 2019-09-10 06:00 | thing1 | 3       | 2019-12-02 11:00 | 20    |

I started with the idea of joining everything together and using some WHERE clauses like:我从将所有内容连接在一起并使用一些WHERE子句的想法开始，例如：

select
*
from
tbl1
left join tbl2 on tbl1.id = tbl2.tbl1_id
left join tbl3 on tbl2.id = tbl3.tbl2_id
where
???

But couldn't get it working and not sure if this would even work.但无法让它工作，也不确定这是否会奏效。 Perhaps there's some sort of window functions that would do it?也许有某种 window 函数可以做到这一点？ It feels like it should be possible to do this in SQL but after two days of trying I'm completely at a loss as to how!感觉应该可以在 SQL 中做到这一点，但经过两天的尝试，我完全不知道该怎么做！

Answer 1

This is quite complicated.这是相当复杂的。 It would be simpler if you had the tbl1 id in all the tables.如果您在所有表中都有tbl1 id，那会更简单。

In any case, the idea is to union all the columns together along with the tbl1 id and updated_at .无论如何，我们的想法是将union all列与tbl1 id 和updated_at结合在一起。 Then aggregate, so there is one row per id and date .然后聚合，所以每个id和date有一行。

Finally, use last_value() with the ignore nulls option to get the most recent value that is populated:最后，使用带有ignore nulls选项的last_value()来获取填充的最新值：

with t as (
      select id, updated_at, max(bool) as bool, max(descr) as descr, max(value) as value
      from (select tbl1.id, tbl1.updated_at, tbl1.bool, null as descr, null as value
            from tbl1 
            union all
            select tbl2.tbl1_id, tbl2.updated_at, null, tbl2.descr, null
            from tbl2
            union all
            select tbl2.tbl1_id, tbl2.updated_at, null, null, tbl3.value
            from tbl2 join
                 tbl3
                 on tbl2.id = tbl3.tbl2_id
           ) t
     group by id, updated_at
    )
select id, updated_at,
       last_value(bool ignore nulls) over (partition by id order by updated_at) as bool,
       last_value(descr ignore nulls) over (partition by id order by updated_at) as descr,
       last_value(value ignore nulls) over (partition by id order by updated_at) as value
from t;

如何从三个单独的表中构建一个事件表，以显示随时间的增量变化？

问题描述

1 个解决方案

解决方案1
1 已采纳 2021-04-30 00:11:02

如何从三个单独的表中构建一个事件表，以显示随时间的增量变化？

问题描述

1 个解决方案

解决方案1 1 已采纳 2021-04-30 00:11:02

解决方案1
1 已采纳 2021-04-30 00:11:02