[英]SSIS ETL Transform-Load How To Handle Create / Update (i.e. UPSERT) for Foreign Key Table Data?
I am performing ETL on a set of Office
, Employee
, Location
tables by following the standard practice of bringing all the data into Staging tables, first by way of Extract packages, and then performing Transform-Load on each of the staging table to get the data into the respective table(s). 我将按照以下标准惯例在一组
Office
, Employee
, Location
表上执行ETL:首先通过Extract包将所有数据放入Staging表中,然后对每个staging表执行Transform-Load以获取数据放入相应的表中。
In each of my Transform-Load SSIS Packages, I am performing CUD ( Create, Update, Delete ) by using MERGE JOIN
and CONDITIONAL
Splits. 在我的每个Transform-Load SSIS程序包中,我正在使用
MERGE JOIN
和CONDITIONAL
拆分来执行CUD ( 创建,更新,删除 )。
This works fine when the data in Staging table is 1-to-1 with the corresponding real table. 当暂存表中的数据与对应的实际表为1对1时,这可以很好地工作。 In the scenario below (see image) the
SampleLocation
table is 1-to-1 with the StageSampleLocation
table. 在下面的场景中(参见图片),
SampleLocation
表与SampleLocation
表是StageSampleLocation
。
The trouble I am having is deciding how to handle a situation where the Staging table has data that will go into the Foreign key table(s). 我遇到的麻烦是决定如何处理暂存表中包含将要进入外键表的数据的情况。
If you take a look at the following database diagram... 如果您看下面的数据库图...
The data from StageSampleOffice
goes into SampleOffice
for fields that have Office data. 来自
StageSampleOffice
的数据进入具有Office数据的字段的SampleOffice
中。 In addition to office data, the StageSampleOffice
has Person data -- in this example, OfficeManagerName
field will need to be looked up in the FK table SamplePerson
. 除了办公室数据之外,
StageSampleOffice
还具有Person数据-在此示例中,将需要在FK表SamplePerson
查找OfficeManagerName
字段。 If the name doesn't exist in SamplePerson
table, this name will need to be inserted in SamplePerson
first, and the PersonId
PK value for that person will be retrieved and stored as the FK value in the row for the imported Office in the SampleOffice
table, in my Data Flow Task. 如果该名称在
SamplePerson
表中不存在,则需要首先将该名称插入SamplePerson
,并且该人的PersonId
PK值将被检索并作为FK值存储在SampleOffice
表中导入的Office的SampleOffice
中,在我的数据流任务中。
Similarly, for the address info in StageSampleOffice
, the details will need to be looked up in the SampleLocation
FK table, and if the address doesn't exist, a new one needs to be inserted with the corresponding values from StageSampleOffice
. 同样,对于
StageSampleOffice
的地址信息,需要在SampleLocation
FK表中查找详细信息,如果该地址不存在,则需要插入一个新地址,并带有StageSampleOffice
的相应值。 Once that is done, the LocationId
for the address will be stored as FK in the SampleOffice
table. 完成后,该地址的
LocationId
将作为FK存储在SampleOffice
表中。
As you can see, data for SampleLocation
and SamplePerson
could come into the system from 2 or more sources. 如您所见,
SampleLocation
和SamplePerson
数据可能会从2个或更多来源进入系统。 In the example above, for SampleLocation
I get a Location data file that only has the addresses. 在上面的示例中,对于
SampleLocation
我得到一个仅具有地址的Location数据文件。 I also get addresses as part of Office records from various office types, that come in the StageSampleOffice
table. 我还从
StageSampleOffice
表中获得了来自各种办公类型的Office记录的地址,作为Office记录的一部分。
I have already separated the Extract workflows from the Transform-Load workflows. 我已经将“ 提取”工作流与“转换-加载”工作流分开了。 I have 1 extract package per staging table, that essentially reads data from the source (flatfile or table), truncates the staging table, and imports everything as-is into the staging table.
我每个登台表都有1个提取包,该包实际上从源(平面文件或表)中读取数据,截断登台表,并将所有内容按原样导入到登台表中。
I am thinking that 我在想
StageSampleOffice
, I will first insert the data into the StageSamplePerson
table (not shown in diagram) and then execute the Transform-Load Package for SamplePerson
that will do the Create or Update for those Persons and StageSampleOffice
存在的Person数据,我首先将数据插入StageSamplePerson
表(图中未显示),然后执行SamplePerson
的Transform-Load程序包,该程序将为这些Persons创建或更新 StageSampleOffice
, I will first insert the data into the StageSampleLocation
table (not shown in diagram) and then execute the Transform-Load Package for SampleLocation
that will do the Create or Update for those Location. StageSampleOffice
存在的位置数据,我将首先将数据插入StageSampleLocation
表(图中未显示),然后对SampleLocation
执行Transform-Load程序包,该程序将为这些Location进行创建或更新。 This way, all the FK rows will be present in the respective table when the flow returns to my main package that does the Transform-Load for the SampleOffice
table. 这样,当流程返回到对
SampleOffice
表执行Transform-Load的主程序包时,所有FK行都将出现在相应的表中。
Is this a good idea, or is there a better way. 这是一个好主意,还是有更好的方法。
Thank you! 谢谢!
Seems like "6 of one, half-dozen of the other" to me. 在我看来就像“一个,六个,另一个的六个”。
Either way you're eventually checking every person and every location to see if it is already in the final destination table and doing an insert or not. 无论哪种方式,您最终都要检查每个人和每个位置,以查看它是否已在最终目标表中,并进行插入。
Whether you "pre-condense" them in the staging tables or not the workload will be the same. 无论您是否将它们“预压缩”在登台表中,工作负载都将是相同的。 I would go with the approach that seems more intuitive to you, because that will be the one that you will find more maintainable in the future.
我会采用对您来说似乎更直观的方法,因为这将是您将来会发现更易于维护的方法。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.