简体繁体 English

Azure 数据工厂中没有上次修改日期和主键字段的增量加载

[英]Incremental Load without Last Modified Date and Primary Key field in Azure Data Factory

原文 2022-12-14 07:55:40 4 1 azure/ azure-data-factory/ etl

I am trying to do incremental load in azure data factory.我正在尝试在 Azure 数据工厂中进行增量加载。 Most of the tables in the database doesn't have last modified date column.数据库中的大多数表都没有最后修改日期列。 I don't have rights to add watermark columns in the tables.我无权在表格中添加水印列。 Is there any way to do incremental loading without last modified date and primary key column?有没有办法在没有最后修改日期和主键列的情况下进行增量加载？

I don't know which approach I can use.我不知道我可以使用哪种方法。 so kindly help me.请帮助我。 Thanks in advance.提前致谢。

1 个解决方案

If you source database support native Change Data Capture feature, then you can use ADF Mapping data flow with no timestamp or ID columns are required to identify the changes since it uses the native change data capture technology in the databases.如果您的源数据库支持本机变更数据捕获功能，那么您可以使用 ADF 映射数据流，不需要时间戳或 ID 列来识别更改，因为它使用数据库中的本机变更数据捕获技术。

For complete demonstration, please refer to this public documentation: Change data capture in Azure Data Factory and Azure Synapse Analytics如需完整演示，请参阅此公共文档： Azure Data Factory 和 Azure Synapse Analytics 中的更改数据捕获

Another possible approach is if you can access both old data (Previously loaded data to your sink) and latest data (source) with changes then you can use mapping data flow in ADF and implement hashing to compare the both the datasets and pick the changed data as per your requirement.另一种可能的方法是，如果您可以访问旧数据（以前加载到接收器的数据）和最新数据（源）并进行更改，那么您可以在 ADF 中使用映射数据流并实现散列来比较两个数据集并选择更改的数据根据您的要求。

You can refer to this demonstration for the same implementation: Data Flows: How to capture changed data您可以参考此演示以了解相同的实现： Data Flows: How to capture changed data