简体   繁体   English

每当更新任何记录时更新增量表

[英]Delta table update whenever any record gets updated

I am having a databricks delta table created on data lake storage which holds data as shown below.我在数据湖存储上创建了一个 databricks 增量表,其中包含如下所示的数据。

db_name table_name  location                    table_format    table_type  load_ts
--------------------------------------------------------------------------------------------------------
abc     table1      dbfs:/mnt/abc/data/table1   delta           EXTERNAL    2022-09-14T18:48:02.859+0000
abc     table2      dbfs:/mnt/abc/data/table2   delta           EXTERNAL    2022-09-14T18:48:02.859+0000
xyz     table1      dbfs:/mnt/xyz/data/table1   delta           EXTERNAL    2022-09-14T18:48:02.859+0000
xyz     table2      dbfs:/mnt/xyz/data/table2   delta           EXTERNAL    2022-09-14T18:48:02.859+0000
xyz     table3      dbfs:/mnt/xyz/data/table3   delta           EXTERNAL    2022-09-14T18:48:02.859+0000
--------------------------------------------------------------------------------------------------------

Currently I am running this script daily to overwrite the complete table in databrcicks.目前我每天都在运行这个脚本来覆盖 databrcicks 中的完整表。 But my requirement is, this records should update only if there is any change occurred on that particular record else leave it.但我的要求是,只有在该特定记录发生任何更改时才应更新此记录,否则将其保留。 If there is any new records, it has to be added.如果有任何新记录,则必须添加。

For Ex: Below is the input data frame I get whenever I run the script.例如:下面是我在运行脚本时得到的输入数据框。 在此处输入图像描述 There is a change for one of the record and one new record gets added.其中一条记录发生了变化,并添加了一条新记录。 only updated record should get updated and new record should be added to the final table.只有更新的记录应该得到更新,新的记录应该添加到最终表中。
The output table is expected to be updated as below. output 表预计更新如下。 在此处输入图像描述

I tried delta merge option but it updates all matching record which is not expected.我尝试了增量合并选项,但它更新了所有不期望的匹配记录。

I have to keep the load_ts untouched if there NO update/insertion of new record happens (This is to track the time when any update is happened to that record).如果没有更新/插入新记录,我必须保持 load_ts 不变(这是为了跟踪该记录发生任何更新的时间)。 If any record is updated, ONLY it should update in target table.如果任何记录被更新,只有它应该在目标表中更新。

Is there any way/logic to achieve this?有没有办法/逻辑来实现这一目标?
Any leads Appreciated!任何线索赞赏!

I reproduced the same thing in my environment its worked fine with the merge operation.我在我的环境中复制了同样的东西,它在合并操作中运行良好。

First of all, I create a sample Data frame and did a merge operation.首先,我创建了一个示例数据框并进行了合并操作。

I have taken sour as a target and vv as a source.我以sour为目标,以vv为源。 Please follow below code:请按照以下代码:

Code :代码

%sql

MERGE INTO sour AS target
USING vv as source
 on target.location = source.location
 WHEN MATCHED 
THEN UPDATE SET 
 target.db_name = source.db_name,
 target.table_name = source.table_name,
 target.location = source.location,
 target.table_type = source.table_type,
 target.load_ts = source.load_ts
 
WHEN NOT MATCHED then
INSERT (target.db_name,target.table_name,target.location,target.table_format,target.table_type,target.load_ts) VALUES(source.db_name,source.table_name,source.location,source.table_format,source.table_type,source.load_ts)

在此处输入图像描述

Output: Output:

在此处输入图像描述

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM