简体   繁体   English

如何不断更新从具有不同更新时间的多个表创建的表

[英]How to keep updating a table which is created from multiple tables with different update times

I want to create a table by joining multiple source tables.我想通过连接多个源表来创建一个表。

This table should have new entries or updates in last 24 hours from the source tables.该表在过去 24 小时内应该有来自源表的新条目或更新。

I will insert the new data and delete the updated data and reinsert it.我将插入新数据并删除更新的数据并重新插入。

The problem is that the source tables are not updated at the same time.问题是源表没有同时更新。

What is the best way to keep this table up to date with the data from source tables.使此表与源表中的数据保持最新的最佳方法是什么。

Following example code will work if table 'a' is updated but what if table 'b' and 'c' are updated later how can I update my new table as well to get the updated fields from those tables?如果表 'a' 更新,以下示例代码将起作用,但如果表 'b' 和 'c' 稍后更新,我如何更新我的新表以及从这些表中获取更新的字段?

I am using snowflake database.我正在使用雪花数据库。

insert into combined_table
select a.id, max(b.shipment_date), b.quantity, c.status 
from table_a a 
left join table_b b on a.id=b.a_id 
left join table_c c on b.id=c.b_id 
a.record_updated_at > dateadd(HOUR, -24, CURRENT_TIMESTAMP)
group by a.id, c.status

table_a
id  created_at   updated_at 
1   2019-02-14   2019-02-16

table_b
id  a_id  shipment_date  quantity created_at  updated_at  
3   1     2019-02-15     5        2019-02-15  2019-02-16

table_c
id   b_id   status    created_at   updated_at
5    3      Inactive  2019-02-15   2019-02-15

combined_table
id shipment_date  quantity status
1  2019-02-15     5        Inactive

if for example table_b quantity changes from 5 to 7 and in table_c status change to 'Active' how can I update this in my delta table?例如,如果 table_b 数量从 5 变为 7,并且 table_c 状态变为“活动”,我如何在我的增量表中更新它?

table_b
id  a_id  shipment_date  quantity created_at  updated_at  
3   1     2019-02-15     5        2019-02-15  2019-02-16

table_c
id   b_id   status    created_at   updated_at
5    3      Active    2019-02-15   2019-02-16

Combined table should look like following.组合表应如下所示。 What is the best way?什么是最好的方法?

combined_table
id shipment_date  quantity status
1  2019-02-15     7        Active

I'd take a look at Tasks and Streams in Snowflake.我会看看雪花中的任务和流。 These would allow you to setup a stream on each of your tables to capture changes that have occurred there, and then run a task against those streams on a scheduled basis IF there are changes available.这些将允许您在每个表上设置一个流以捕获那里发生的更改,然后如果有可用更改,则按计划对这些流运行任务。

From what I understand of your problem I think you may want to change those joins to full join s.根据我对您的问题的理解,我认为您可能希望将这些连接更改为full join This will cover the case where an ID may exist in b , c or d but doesn't yet exist in a .这将涵盖 ID 可能存在于bcd但不存在于a After that you can use a where predicate with a bunch of or statements to check each table for changes.之后,您可以使用带有一堆or语句的where谓词来检查每个表的更改。

Here is a simplified example of what I think you need:这是我认为您需要的简化示例:

-- set up the example tables
create or replace temporary table table_a (id number, record_updated_at timestamp_ntz);
create or replace temporary table table_b (id number, shipment_date date, record_updated_at timestamp_ntz);
create or replace temporary table table_c (id number, status varchar, record_updated_at timestamp_ntz);

-- add some sample data
insert overwrite into table_a values (1, '2019-01-01T01:00:00'), (2, '2019-01-01T04:00:00');
insert overwrite into table_b values (1, '2019-01-01','2019-01-01T01:00:00'), (3, '2019-01-02','2019-01-01T03:00:00');
insert overwrite into table_c values (1, 'shipped','2019-01-01T01:00:00');

-- return any records that have changed in any table
select
    a.id a_id, 
    a.record_updated_at a_updated,
    b.id b_id, 
    b.record_updated_at b_updated,
    b.shipment_date,
    c.id c_id,
    c.status,
    c.record_updated_at c_updated
from table_a a
full join table_b b on a.id = b.id
full join table_c c on a.id = c.id 
where a.record_updated_at > '2019-01-01T02:00:00'
    or b.record_updated_at  > '2019-01-01T02:00:00'
    or c.record_updated_at > '2019-01-01T02:00:00'

Alternatively you can do what Mike has mentioned and used tasks + streams.或者,您可以执行 Mike 提到的操作并使用任务 + 流。 This is a pretty neat way of doing it too.这也是一个非常巧妙的方法。

Look at new STREAMS ON VIEW feature.查看新的 STREAMS ON VIEW 功能。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM