简体   繁体   中英

About Surrogate key in Loading Process in DataWarehouse

When you do the loading process from stage table to the fact and dimension table and does it mean that you also load the surrogate key from stage to the dimension table in relation to new rows?

Or do you create new surrogate key in dimension table by using the sql code Identity for the table? ( https://docs.microsoft.com/en-us/sql/t-sql/statements/create-table-transact-sql-identity-property?view=sql-server-2017 )?

Which approach is correct?

Other information:
*I'm newbie in ETL and Business Intelligence
*I'm using only T-SQL, no SSIS.

Thank you!

Question is not very clear. Ill attempt to answer based on what i "think" you are asking but it would be better to ensure the question is crystal clear to people unfamiliar with the data, and provide sample data.

I think you are asking if you need to load entries into a dimension table, for records that are being loaded into a fact table, at the same time the fact table is being loaded.

Generally the dimension members are loaded into the dimension table before loading data into the fact table. Its just easier to do it this way if at all possible. The steps i would use, in order are:

  • load the dimension with any new members in its own stored procedure. This ensures the you now have a surrogate key for any new members. do this for all dimensions.
  • Create a 2nd stored procedure to load the fact table. join the staging table to the dimension tables to get the surrogate keys. code below shows an example for one dimension but just do more joins to more dimensions as needed.

The below code populates a sample dimension and factStaging table with contrived data, to show how to then get the surrogate key and data to be inserted into the fact table.

create table #factstaging
(
    dimension1Value nvarchar(20),
    factmeasure1 int,
    factmeasure2 int
)
create table #dimension1
(
    ID int identity(1,1),
    dimension1Value nvarchar(20)
)
insert into #dimension1
values
('d1 value 1'),
('d1 value 2'),
('d1 value 3')

insert into #factstaging
values
('d1 value 1',22,44),
('d1 value 1',22,44),
('d1 value 2',22,44),
('d1 value 3',22,44)

--contents of stored procedure to insert fact rows
select d1.ID as Dimension1SurrogateKey, s.factmeasure1,s.factmeasure2
from #factStaging s
join #dimension1 d1 on s.dimension1Value = d1.dimension1Value

Note:

  • your data needs to be clean.
  • if facts are arriving before the dimension data, the pattern will be different, and need to use something like a late arriving dimension pattern which is a lot more complex.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM