简体   繁体   中英

What should update when an SCD1 change happens on an SCD2 aware table

I am not sure if this is answered in the Kimball Data-Warehouse literature - I could not find an answer.

Lets say you have a table with a couple of SCD2 aware columns, and some SCD1 columns. The question is simply this: Which records are updated when a change to an SCD1 column happens?

Do you change ALL records, the current one and all historic ones? Or do you change only the most recent (open) record? I tend to go by "all", while all my colleagues insist on "most recent only".

The only reason I can give for my opinion is, that I want a COUNT(DISTINCT...) to give the same result when I run it on the business key column, or on an SCD1 column.

Thinking of it, there are two special cases: What if a record does not have a current version, if the most recent one went out of scope and there was no new instance? What if I use my SCD2 logic to prepare records that will become active automatically in the future?

If I only update the most recent open version, than that would mean I do not change anything in the first special case, while in the other case I need to change all future (prepared) records. Thinking of it... I am confident "all" is the correct answer.

Can anyone point to a well known white-paper website (like Kimball, Microsoft) that gives an answer?

All. The intent of type 1 is that facts are associated with the attribute's current value:

在此处输入图像描述

https://www.kimballgroup.com/2013/02/design-tip-152-slowly-changing-dimension-types-0-4-5-6-7/

In a type 2 SCD facts are associated with a "version" of the dimension that was active when the fact occurred. In order for a fact that's associated with a version to be "associated with the attribute's current value" that current value must be updated on all versions of the dimension.

For instance if you have

SalesFact(Date,CustomerId,Amount)

and

CustomerDim(CustomerId, CustomerBK, EffectiveDate, EndEffectiveDate, IsCurrent, Name, SalesPersonId)

And you want to associate historical Sales facts with the customer's current SalesPerson rather than the SalesPerson active at the time of the sale, you must update CustomerDim for all versions of that Customer. Eg

update CustomerDim set SalesPersonId = 1234 where CustomerBk = 'Acme'

Or else a query like

select sum(Amount)
from SalesFact s
join CustomerDim c
  on s.CustomerId = c.CustomerId
where c.SalesPersonId = 1234

won't return the older sales of that SalesPerson's customers.

By definition, all rows must be updated when an SCD1 attribute changes. If you only change the most recent version, it is an SCD2 update.

If you have a large number of version rows, you may consider splitting your SCD1 and SCD2 dimensions into separate tables. That does add an additional reference from the fact (Type 4) or require snowflaking the new SCD2 dimension to the new SCD1 dimension. Both options address issues with updates to a mixed dimension.

The approach is called Slowly Changing Dimensions, not Slowly Changing Attributes. Ignoring the complexity of higher levels, a dimension is either SCD0,1 or 2 - so if any attribute in an SCD2 dimension changes then you create a new record.

Trying to make attributes within a dimension have different SCD levels just gets you into the mess you've described.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM