简体   繁体   English

SSIS ETL转换负载如何处理外键表数据的创建/更新(即UPSERT)?

[英]SSIS ETL Transform-Load How To Handle Create / Update (i.e. UPSERT) for Foreign Key Table Data?

I am performing ETL on a set of Office , Employee , Location tables by following the standard practice of bringing all the data into Staging tables, first by way of Extract packages, and then performing Transform-Load on each of the staging table to get the data into the respective table(s). 我将按照以下标准惯例在一组OfficeEmployeeLocation表上执行ETL:首先通过Extract包将所有数据放入Staging表中,然后对每个staging表执行Transform-Load以获取数据放入相应的表中。

In each of my Transform-Load SSIS Packages, I am performing CUD ( Create, Update, Delete ) by using MERGE JOIN and CONDITIONAL Splits. 在我的每个Transform-Load SSIS程序包中,我正在使用MERGE JOINCONDITIONAL拆分来执行CUD创建,更新,删除 )。

This works fine when the data in Staging table is 1-to-1 with the corresponding real table. 当暂存表中的数据与对应的实际表为1对1时,这可以很好地工作。 In the scenario below (see image) the SampleLocation table is 1-to-1 with the StageSampleLocation table. 在下面的场景中(参见图片), SampleLocation表与SampleLocation表是StageSampleLocation

The trouble I am having is deciding how to handle a situation where the Staging table has data that will go into the Foreign key table(s). 我遇到的麻烦是决定如何处理暂存表中包含将要进入外键表的数据的情况。

The Problem explained 问题解释

If you take a look at the following database diagram... 如果您看下面的数据库图...

在此处输入图片说明

The data from StageSampleOffice goes into SampleOffice for fields that have Office data. 来自StageSampleOffice的数据进入具有Office数据的字段的SampleOffice中。 In addition to office data, the StageSampleOffice has Person data -- in this example, OfficeManagerName field will need to be looked up in the FK table SamplePerson . 除了办公室数据之外, StageSampleOffice还具有Person数据-在此示例中,将需要在FK表SamplePerson查找OfficeManagerName字段。 If the name doesn't exist in SamplePerson table, this name will need to be inserted in SamplePerson first, and the PersonId PK value for that person will be retrieved and stored as the FK value in the row for the imported Office in the SampleOffice table, in my Data Flow Task. 如果该名称在SamplePerson表中不存在,则需要首先将该名称插入SamplePerson ,并且该人的PersonId PK值将被检索并作为FK值存储在SampleOffice表中导入的Office的SampleOffice中,在我的数据流任务中。

Similarly, for the address info in StageSampleOffice , the details will need to be looked up in the SampleLocation FK table, and if the address doesn't exist, a new one needs to be inserted with the corresponding values from StageSampleOffice . 同样,对于StageSampleOffice的地址信息,需要在SampleLocation FK表中查找详细信息,如果该地址不存在,则需要插入一个新地址,并带有StageSampleOffice的相应值。 Once that is done, the LocationId for the address will be stored as FK in the SampleOffice table. 完成后,该地址的LocationId将作为FK存储在SampleOffice表中。

As you can see, data for SampleLocation and SamplePerson could come into the system from 2 or more sources. 如您所见, SampleLocationSamplePerson数据可能会从2个或更多来源进入系统。 In the example above, for SampleLocation I get a Location data file that only has the addresses. 在上面的示例中,对于SampleLocation我得到一个仅具有地址的Location数据文件。 I also get addresses as part of Office records from various office types, that come in the StageSampleOffice table. 我还从StageSampleOffice表中获得了来自各种办公类型的Office记录的地址,作为Office记录的一部分。

What I have tried so far 到目前为止我尝试过的

I have already separated the Extract workflows from the Transform-Load workflows. 我已经将“ 提取”工作流与“转换-加载”工作流分开了。 I have 1 extract package per staging table, that essentially reads data from the source (flatfile or table), truncates the staging table, and imports everything as-is into the staging table. 我每个登台表都有1个提取包,该包实际上从源(平面文件或表)中读取数据,截断登台表,并将所有内容按原样导入到登台表中。

I am thinking that 我在想

  • for Person data that is present in the StageSampleOffice , I will first insert the data into the StageSamplePerson table (not shown in diagram) and then execute the Transform-Load Package for SamplePerson that will do the Create or Update for those Persons and 对于StageSampleOffice存在的Person数据,我首先将数据插入StageSamplePerson表(图中未显示),然后执行SamplePersonTransform-Load程序包,该程序将为这些Persons创建或更新
  • for Location data that is present in the StageSampleOffice , I will first insert the data into the StageSampleLocation table (not shown in diagram) and then execute the Transform-Load Package for SampleLocation that will do the Create or Update for those Location. 对于StageSampleOffice存在的位置数据,我将首先将数据插入StageSampleLocation表(图中未显示),然后对SampleLocation执行Transform-Load程序包,该程序将为这些Location进行创建或更新。

This way, all the FK rows will be present in the respective table when the flow returns to my main package that does the Transform-Load for the SampleOffice table. 这样,当流程返回到对SampleOffice表执行Transform-Load的主程序包时,所有FK行都将出现在相应的表中。

Is this a good idea, or is there a better way. 这是一个好主意,还是有更好的方法。

Thank you! 谢谢!

Seems like "6 of one, half-dozen of the other" to me. 在我看来就像“一个,六个,另一个的六个”。

Either way you're eventually checking every person and every location to see if it is already in the final destination table and doing an insert or not. 无论哪种方式,您最终都要检查每个人和每个位置,以查看它是否已在最终目标表中,并进行插入。

Whether you "pre-condense" them in the staging tables or not the workload will be the same. 无论您是否将它们“预压缩”在登台表中,工作负载都将是相同的。 I would go with the approach that seems more intuitive to you, because that will be the one that you will find more maintainable in the future. 我会采用对您来说似乎更直观的方法,因为这将是您将来会发现更易于维护的方法。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 在SSIS数据加载期间,如何使用主表作为查找来在参考表中填充外键 - How to use the master table as a lookup to populate the foreign key in the Reference table during SSIS data load 如何用外键更新表 - how to update table with foreign key 如何创建带有外键引用的表? - How do I create a table with a foreign key reference? 为什么我不能为这个表创建外键? - Why can I not create a foreign key for this table? 在ETL软件包中,如何为事实表联接外键? - What are the ways to join foreign key for fact table in ETL package? 您应该为简单永不更改数据(例如数学运算还是使用描述性文本)创建外键表吗 - Should you create a Foreign key table for Simple never changing data e.g. mathematical operations vs using descriptive text 事实表分区:如何处理ETL中的更新? - Fact table partitioning: how to handle updates in ETL? 在SSIS中为事实表创建外键列 - Creating foreign key columns for fact table in SSIS 如何仅使用SELECT子句(即使用不带FROM子句的SELECT)创建具有多行和多列的表 - How to create table with multiple rows and columns using only SELECT clause (i.e. using SELECT without FROM clause) 使用来自另一个表的外键将数据插入或更新到表中 - Insert or Update Data into Table with Foreign Key from Another Table
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM