简体   繁体   English

如何从三个单独的表中构建一个事件表,以显示随时间的增量变化?

[英]How do I build an events table from three separate tables showing incremental change over time?

I'm trying to build a dataset that shows incremental change over time for some product attributes.我正在尝试构建一个数据集,以显示某些产品属性随时间的增量变化。 The data is in AWS Athena in three separate tables that each store different attributes and they can be updated independently at different times.数据位于 AWS Athena 中的三个独立表中,每个表存储不同的属性,并且可以在不同时间独立更新。 tbl1 can be joined to tbl2 and tbl2 can be joined to tbl3 . tbl1可以连接到tbl2并且tbl2可以连接到tbl3 There is always a one-to-one relationship between the tables so tbl1.id=1 will only ever relate to tbl2.id=2 and tbl2.id=2 will only relate to tbl3.id=3 in this example:表之间始终存在一对一的关系,因此在此示例中tbl1.id=1只会与tbl2.id=2相关,而tbl2.id=2只会与tbl3.id=3相关:

tbl1
| id | updated_at       | bool  |
| 1  | 2019-09-10 06:00 | True  |
| 1  | 2020-08-05 10:00 | False |
| 1  | 2020-09-03 15:00 | True  |

tbl2
| id | tbl1_id | updated_at       | desc    |
| 2  | 1       | 2019-09-10 06:00 | thing 1 |

tbl3
| id | tbl2_id | updated_at       | value |
| 3  | 2       | 2019-09-10 06:00 | 100   |
| 3  | 2       | 2019-09-19 09:00 | 50    |
| 3  | 2       | 2019-12-02 11:00 | 20    |

I'm trying to write a query that joins this data into a single table and has a row for each incremental update.我正在尝试编写一个查询,将这些数据连接到一个表中,并且每个增量更新都有一行。 From the above tables there was the initial insert on 2019-09-10 then four other changes made across tbl1 and tbl3 so it should end up as five rows that look like:从上表中可以看出,在 2019 年 9 月 10 日进行了初始插入,然后在tbl1tbl3中进行了其他四项更改,因此最终应为五行,如下所示:

| tbl1_id | tbl1_updated_at  | bool  | tbl2_id | tbl2_updated_at  | desc   | tbl3_id | tbl3_updated_at  | value |
| 1       | 2019-09-10 06:00 | True  | 2       | 2019-09-10 06:00 | thing1 | 3       | 2019-09-10 06:00 | 100   |
| 1       | 2019-09-10 06:00 | True  | 2       | 2019-09-10 06:00 | thing1 | 3       | 2019-09-19 09:00 | 50    |
| 1       | 2019-09-10 06:00 | True  | 2       | 2019-09-10 06:00 | thing1 | 3       | 2019-12-02 11:00 | 20    |
| 1       | 2020-08-05 10:00 | False | 2       | 2019-09-10 06:00 | thing1 | 3       | 2019-12-02 11:00 | 20    |
| 1       | 2020-09-03 15:00 | True  | 2       | 2019-09-10 06:00 | thing1 | 3       | 2019-12-02 11:00 | 20    |

I started with the idea of joining everything together and using some WHERE clauses like:我从将所有内容连接在一起并使用一些WHERE子句的想法开始,例如:

select
*
from
tbl1
left join tbl2 on tbl1.id = tbl2.tbl1_id
left join tbl3 on tbl2.id = tbl3.tbl2_id
where
???

But couldn't get it working and not sure if this would even work.但无法让它工作,也不确定这是否会奏效。 Perhaps there's some sort of window functions that would do it?也许有某种 window 函数可以做到这一点? It feels like it should be possible to do this in SQL but after two days of trying I'm completely at a loss as to how!感觉应该可以在 SQL 中做到这一点,但经过两天的尝试,我完全不知道该怎么做!

This is quite complicated.这是相当复杂的。 It would be simpler if you had the tbl1 id in all the tables.如果您在所有表中都有tbl1 id,那会更简单。

In any case, the idea is to union all the columns together along with the tbl1 id and updated_at .无论如何,我们的想法是将union all列与tbl1 id 和updated_at结合在一起。 Then aggregate, so there is one row per id and date .然后聚合,所以每个iddate有一行。

Finally, use last_value() with the ignore nulls option to get the most recent value that is populated:最后,使用带有ignore nulls选项的last_value()来获取填充的最新值:

with t as (
      select id, updated_at, max(bool) as bool, max(descr) as descr, max(value) as value
      from (select tbl1.id, tbl1.updated_at, tbl1.bool, null as descr, null as value
            from tbl1 
            union all
            select tbl2.tbl1_id, tbl2.updated_at, null, tbl2.descr, null
            from tbl2
            union all
            select tbl2.tbl1_id, tbl2.updated_at, null, null, tbl3.value
            from tbl2 join
                 tbl3
                 on tbl2.id = tbl3.tbl2_id
           ) t
     group by id, updated_at
    )
select id, updated_at,
       last_value(bool ignore nulls) over (partition by id order by updated_at) as bool,
       last_value(descr ignore nulls) over (partition by id order by updated_at) as descr,
       last_value(value ignore nulls) over (partition by id order by updated_at) as value
from t;

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何在三个表上使用MAX功能? - How do I use the MAX function over three tables? 如何从MySQL的三个表中获取数据? - How do I obtain data from three tables in MySQL? 如何从MySQl Workbench中每个输入表都具有一行的三个独立表中返回输出表? - How to return an output table from three separate tables that has one row from each input table in MySQl Workbench? 使用PHP从三个MySQL表构建CSV表的有效方法 - Efficient way to build CSV table from three MySQL tables with PHP 难住了。?! Teradata SQL。 如何随着时间的推移重复组从历史表中提取每个组更改的 MIN/MAX 值? - Stumped!!! Teradata SQL. How do I pull MIN/MAX values from a history table for each group change with repeating groups over time? 如何创建一个表与其他表格同时创建一个表格? - How do I create a table the same time as other tables that pulls totals from each? 如何将此 JSON 导入 2 个单独的表中? - How do I import this JSON into 2 separate tables? 如何合并同一张表中的三个“选择不同” - How do I combine three “select distinct” from same table 从三个单独的表中收集数据,SQL - Gathering data from three separate tables, sql 如何在SQL中的不同行中将多个时间段连接在一起? - How do I join together multiple time periods over separate rows in SQL?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM