简体繁体中英

SSIS ETL Transform-Load How To Handle Create / Update (i.e. UPSERT) for Foreign Key Table Data?

原文 2015-12-30 22:21:20 7 1 sql-server/ architecture/ ssis/ etl/ ssis-2012

I am performing ETL on a set of Office , Employee , Location tables by following the standard practice of bringing all the data into Staging tables, first by way of Extract packages, and then performing Transform-Load on each of the staging table to get the data into the respective table(s).

In each of my Transform-Load SSIS Packages, I am performing CUD ( Create, Update, Delete ) by using MERGE JOIN and CONDITIONAL Splits.

This works fine when the data in Staging table is 1-to-1 with the corresponding real table. In the scenario below (see image) the SampleLocation table is 1-to-1 with the StageSampleLocation table.

The trouble I am having is deciding how to handle a situation where the Staging table has data that will go into the Foreign key table(s).

The Problem explained

If you take a look at the following database diagram...

The data from StageSampleOffice goes into SampleOffice for fields that have Office data. In addition to office data, the StageSampleOffice has Person data -- in this example, OfficeManagerName field will need to be looked up in the FK table SamplePerson . If the name doesn't exist in SamplePerson table, this name will need to be inserted in SamplePerson first, and the PersonId PK value for that person will be retrieved and stored as the FK value in the row for the imported Office in the SampleOffice table, in my Data Flow Task.

Similarly, for the address info in StageSampleOffice , the details will need to be looked up in the SampleLocation FK table, and if the address doesn't exist, a new one needs to be inserted with the corresponding values from StageSampleOffice . Once that is done, the LocationId for the address will be stored as FK in the SampleOffice table.

As you can see, data for SampleLocation and SamplePerson could come into the system from 2 or more sources. In the example above, for SampleLocation I get a Location data file that only has the addresses. I also get addresses as part of Office records from various office types, that come in the StageSampleOffice table.

What I have tried so far

I have already separated the Extract workflows from the Transform-Load workflows. I have 1 extract package per staging table, that essentially reads data from the source (flatfile or table), truncates the staging table, and imports everything as-is into the staging table.

I am thinking that

for Person data that is present in the StageSampleOffice , I will first insert the data into the StageSamplePerson table (not shown in diagram) and then execute the Transform-Load Package for SamplePerson that will do the Create or Update for those Persons and
for Location data that is present in the StageSampleOffice , I will first insert the data into the StageSampleLocation table (not shown in diagram) and then execute the Transform-Load Package for SampleLocation that will do the Create or Update for those Location.

This way, all the FK rows will be present in the respective table when the flow returns to my main package that does the Transform-Load for the SampleOffice table.

Is this a good idea, or is there a better way.

Thank you!

1 answers

Seems like "6 of one, half-dozen of the other" to me.

Either way you're eventually checking every person and every location to see if it is already in the final destination table and doing an insert or not.

Whether you "pre-condense" them in the staging tables or not the workload will be the same. I would go with the approach that seems more intuitive to you, because that will be the one that you will find more maintainable in the future.

How to use the master table as a lookup to populate the foreign key in the Reference table during SSIS data load

how to update table with foreign key

How do I create a table with a foreign key reference?

Why can I not create a foreign key for this table?

What are the ways to join foreign key for fact table in ETL package?

Should you create a Foreign key table for Simple never changing data e.g. mathematical operations vs using descriptive text

Fact table partitioning: how to handle updates in ETL?

Creating foreign key columns for fact table in SSIS

How to create table with multiple rows and columns using only SELECT clause (i.e. using SELECT without FROM clause)

Insert or Update Data into Table with Foreign Key from Another Table

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question How to use the master table as a lookup to populate the foreign key in the Reference table during SSIS data load how to update table with foreign key How do I create a table with a foreign key reference? Why can I not create a foreign key for this table? What are the ways to join foreign key for fact table in ETL package? Should you create a Foreign key table for Simple never changing data e.g. mathematical operations vs using descriptive text Fact table partitioning: how to handle updates in ETL? Creating foreign key columns for fact table in SSIS How to create table with multiple rows and columns using only SELECT clause (i.e. using SELECT without FROM clause) Insert or Update Data into Table with Foreign Key from Another Table

Related Tags

SSIS ETL Transform-Load How To Handle Create / Update (i.e. UPSERT) for Foreign Key Table Data?

Question

The Problem explained

What I have tried so far

1 answers

solution1 1 2015-12-31 17:00:23

solution1
1 2015-12-31 17:00:23