简体   繁体   中英

SSIS ETL Transform-Load How To Handle Create / Update (i.e. UPSERT) for Foreign Key Table Data?

I am performing ETL on a set of Office , Employee , Location tables by following the standard practice of bringing all the data into Staging tables, first by way of Extract packages, and then performing Transform-Load on each of the staging table to get the data into the respective table(s).

In each of my Transform-Load SSIS Packages, I am performing CUD ( Create, Update, Delete ) by using MERGE JOIN and CONDITIONAL Splits.

This works fine when the data in Staging table is 1-to-1 with the corresponding real table. In the scenario below (see image) the SampleLocation table is 1-to-1 with the StageSampleLocation table.

The trouble I am having is deciding how to handle a situation where the Staging table has data that will go into the Foreign key table(s).

The Problem explained

If you take a look at the following database diagram...

在此处输入图片说明

The data from StageSampleOffice goes into SampleOffice for fields that have Office data. In addition to office data, the StageSampleOffice has Person data -- in this example, OfficeManagerName field will need to be looked up in the FK table SamplePerson . If the name doesn't exist in SamplePerson table, this name will need to be inserted in SamplePerson first, and the PersonId PK value for that person will be retrieved and stored as the FK value in the row for the imported Office in the SampleOffice table, in my Data Flow Task.

Similarly, for the address info in StageSampleOffice , the details will need to be looked up in the SampleLocation FK table, and if the address doesn't exist, a new one needs to be inserted with the corresponding values from StageSampleOffice . Once that is done, the LocationId for the address will be stored as FK in the SampleOffice table.

As you can see, data for SampleLocation and SamplePerson could come into the system from 2 or more sources. In the example above, for SampleLocation I get a Location data file that only has the addresses. I also get addresses as part of Office records from various office types, that come in the StageSampleOffice table.

What I have tried so far

I have already separated the Extract workflows from the Transform-Load workflows. I have 1 extract package per staging table, that essentially reads data from the source (flatfile or table), truncates the staging table, and imports everything as-is into the staging table.

I am thinking that

  • for Person data that is present in the StageSampleOffice , I will first insert the data into the StageSamplePerson table (not shown in diagram) and then execute the Transform-Load Package for SamplePerson that will do the Create or Update for those Persons and
  • for Location data that is present in the StageSampleOffice , I will first insert the data into the StageSampleLocation table (not shown in diagram) and then execute the Transform-Load Package for SampleLocation that will do the Create or Update for those Location.

This way, all the FK rows will be present in the respective table when the flow returns to my main package that does the Transform-Load for the SampleOffice table.

Is this a good idea, or is there a better way.

Thank you!

Seems like "6 of one, half-dozen of the other" to me.

Either way you're eventually checking every person and every location to see if it is already in the final destination table and doing an insert or not.

Whether you "pre-condense" them in the staging tables or not the workload will be the same. I would go with the approach that seems more intuitive to you, because that will be the one that you will find more maintainable in the future.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM