简体   繁体   中英

What should happen to a SCD 2 table record if the record becomes inactive?

Let's say we have a table that has these values:

id| name| country| start_dt| end_dt| is_current| current_dt
1001| John| CA| 2001-01-10| 2012-06-01| TRUE| 2012-06-01

Next day, the same record comes from upstream and the SCD2 record modifies to:

id| name| country| start_dt| end_dt| is_current| partition_dt
1001| John| CA| 2001-01-10| 2012-06-01| FALSE| 2012-06-02
1001| John| US| 2012-06-01| 2016-06-02| TRUE| 2012-06-02

But on the third day, this record stops coming from upstream table.

In this case, what should happen in the SCD2 table?

Scenario 1: The record remains as it is.

id| name| country| start_dt| end_dt| is_current| partition_dt
1001| John| CA| 2001-01-10| 2012-06-01| FALSE| 2012-06-03
1001| John| US| 2012-06-01| 9999-12-31| TRUE| 2012-06-03

Scenario 2: The record becomes inactive in the SCD2 table

id| name| country| start_dt| end_dt| is_current| partition_dt
1001| John| CA| 2001-01-10| 2012-06-01| FALSE| 2012-06-03
1001| John| US| 2012-06-01| 2012-06-03| FALSE| 2012-06-03

You leave the last version as IS_CURRENT=TRUE.

The last-known-good version of an SCD row is sometimes used instead of the time-sensitive version. So you can write

select d.Name, sum(f.someval) totalVal
from somefact f
join somedim d
  on f.dimid = d.id 
  and d.is_current = 'TRUE'

for joins to the dimension that treat it as a non-SCD.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM