简体   繁体   中英

Reference foreign keys using SSIS-Lookup

I am asking for help on the following topic. I am trying to create an ETL process with two Excel data sources (S1 ~300 rows and S2 ~7000 rows). S1 contains project information and employee details and S2 contains the amount of hours, which each employee worked in which project at a timestamp.

I want to insert the amount of hours, which each employee worked in each project at a timestamp, into the fact table by referencing to the existing primary keys in the dimension tables. If an entry is not present in the dimension tables already, i want to add a new entry first and use the newly generated id. The destination table structure looks as follows (Data Warehouse, Star Schema): Destination Table Structure

In SSIS, i created three Data Flow tasks for filling the Dimension Tables (project, employee and time) with distinct values (using group by, as S1 and S2 contain a lot of duplicate rows)first, and a fourth data flow task (see image below) to insert the FactTable data, and this is where I'm running into problems:

Data Flow Task FactTable

I am using three LookUp functions to retrieve the foreignKeys project_id, employee_id and time_id from the Dimension tables (using project name, employee number and timestamp). If the id is found, it is passed on all the way to Merge Join 1 , if not, a new Dimension Entry is created (lets say project) and the generated project_id passed on instead. Same goes for employee and time respectively.

There is two issues with this:

1) The "amount of hours" (passed by Multicast four, see image above) is not matched in the final result ( No Match )

2) The amount of rows being inserted keeps increasing forever ( Endless Join , I belive due to the Merge joins).

What I've tried:

  • I have used one UNION instead of three Merge Joins before, but this resulted in the foreign keys being in seperate rows each, instead of merged together.
  • I used Merge (instead of Merge Join) and combined the join as well as sort conditions in as I fell all possible ways.

I understand that this scenario might be confusing for everybody else, but thank your for taking time looking at it! Any help is greatly appreciated.

Solved it

For anybody having similar issues:

Seperate Data Flows for filling Dimension Tables with those filling Fact Tables will do the trick. Its a clean solution and easier to debug.

Also: Dont run the LookUp Functions in parallel, but rather one after each other and pass on the attributes. Saves unnecessary Merges as well.

So as a Sum Up: Four Data Flow Tasks, three for filling dimension tables ONLY and one for filling fact tables ONLY.

Loading Multiple Tables using SSIS keeping foreign key relationships The answer posted by onupdatecascade is basically it.

Good luck!

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM