[英]SCD2 WITH FACT TABLE IMPLEMENTATION
I am asked to build a client dimension and a bed dimension .我被要求建立一个客户维度和一个床维度。
and bring them together in the sense of clientID-SK,bedID_SK,Bed_begin_date,bed_end-date.Both tables contains SCD1, and SC2 fields.How do I implement this if the dates the clients was and out off bed and out has nothing to do with what defines as a client or bed(types).并将它们从 clientID-SK、bedID_SK、Bed_begin_date、bed_end-date 的意义上组合在一起。两个表都包含 SCD1 和 SC2 字段。如果客户端的日期和下床日期无关,我该如何实现这一点定义为客户或床(类型)。
I have been able to combine them but my challenge is that when I load them into a fact table the table only has the begin_date .How will I update the fact table end_date which is suppose to = the begin_date of the next bed assignment.我已经能够将它们组合起来,但我的挑战是,当我将它们加载到事实表中时,该表只有 begin_date 。我将如何更新事实表 end_date,它假定 = 下一个床位分配的 begin_date。
eg clientID,bedID,Start_Date,End_Date 10 ,ROO1, ,01-19-2020, 3000-01-01 00:00:00.000例如 clientID,bedID,Start_Date,End_Date 10 ,ROO1, ,01-19-2020, 3000-01-01 00:00:00.000
Dimension 10 ,ROO1, ,01-19-2020, 10-19-2020 10 ,ROO2, ,10-19-2020, 3000-01-01 00:00:00.000维度 10 ,ROO1, ,01-19-2020, 10-19-2020 10 ,ROO2, ,10-19-2020, 3000-01-01 00:00:00.000
We have a table called current bed that keeps track of our current client and I was able to build a slowly changing dimension off that table.我们有一个名为 current bed 的表,用于跟踪我们当前的客户,我能够在该表上构建一个缓慢变化的维度。
But we are concerned to follow standard practice we have to have a star schema in place .但是我们担心遵循标准做法,我们必须有一个星型模式。
Any suggestion任何建议
So you have, at least, the following tables:所以你至少有以下表格:
When a bed is first occupied by a client you create a new fact record and populate the Client, Bed and Date Occupied FKs.当床位首次被客户占用时,您创建一个新的事实记录并填充客户、床位和日期占用 FK。 You populate the Bed Vacated with 0 (or whatever key value you have used in the Date Dim to indicate the 'unknown' record).您使用 0(或您在 Date Dim 中使用的任何键值来指示“未知”记录)填充 Bed Vavacated。
When a bed is next occupied, you create a new fact record for the new client and update the Bed Vacated FK on the previous record with the relevant Date key.当床位下次有人使用时,您为新客户创建一个新的事实记录,并使用相关的日期键更新先前记录上的 Bed Vacated FK。
A few things to think about:需要考虑的几件事:
Hope this helps?希望这可以帮助?
Update 1 following response更新 1 以下响应
If you need to include Time then you need a time dimension and 2 additional keys in the fact for occupied and vacated time.如果您需要包括时间,那么您需要一个时间维度和 2 个额外的键来表示已占用和空闲时间。
I'm not sure I understand your question about how you update the fact table.我不确定我是否理解您关于如何更新事实表的问题。 You have the information required to identify the fact record (bed id and vacated date key = 0) and the value needed to update the fact record.您拥有识别事实记录所需的信息(床位 ID 和腾出日期键 = 0)以及更新事实记录所需的值。 What am I missing?我错过了什么?
UPDATE 2更新 2
I think you need to take a step back and think clearly about what it is you are trying to achieve - then the answers to your questions should become more obvious.我认为你需要退后一步,清楚地思考你想要达到的目标——然后你的问题的答案应该变得更加明显。
The first question you need to ask is what are you trying to measure: once you have clearly defined that then the grain of the fact table is established and it becomes clearer what changes in attributes you need to handle.您需要问的第一个问题是您要测量什么:一旦您明确定义了这一点,那么事实表的粒度就建立了,并且您需要处理的属性变化变得更加清晰。 For example:例如:
So to try and answer your questions...因此,尝试回答您的问题...
If there is a change in dimension attributes and it affects your fact table then you'll need to handle this eg by inserting or updating a fact record:如果维度属性发生更改并影响您的事实表,则您需要处理此问题,例如通过插入或更新事实记录:
Records in your fact table should never overlap each other - so at any point in time there is only one active fact record per bed and per patient.事实表中的记录不应相互重叠 - 因此在任何时候,每张病床和每位患者只有一个活动事实记录。 Each time you insert a new fact record you would expire the previous applicable fact record.每次插入新的事实记录时,前一个适用的事实记录都会过期。
So when you ask "The update to the end_date when the client moves to a new bed will be on all 3 added surrogate key rows?"因此,当您问“客户搬到新床时对 end_date 的更新将在所有 3 个添加的代理键行上?” , the answer is no - you would have set the end date on the first 2 records when you created the next record each time ie set the end date of record 1 when you create record 2, set the end date of record 2 when you create record 3, etc.; ,答案是否定的 - 您每次创建下一条记录时都会在前 2 条记录上设置结束日期,即在创建记录 2 时设置记录 1 的结束日期,在创建时设置记录 2 的结束日期记录3等; so you will only be updating the last record when the client moves.所以你只会在客户端移动时更新最后一条记录。
Adding a PK to a fact table is only required when there is a requirement to update the fact table - as is the case here.仅当需要更新事实表时才需要向事实表添加 PK - 就像这里的情况。 Whether you do so is a choice - but I would look at how complicated the compound key is ie how many SKs do you need to use to identify the correct fact record to be updated.您是否这样做是一个选择 - 但我会看看复合键有多复杂,即您需要使用多少个 SK 来识别要更新的正确事实记录。 In you case you only need the Bed SK and the end_date = null (or 31/12/3000 or however you have chosen to set it) so there is probably no benefit in defining a single PK field on the fact table.在您的情况下,您只需要 Bed SK 和 end_date = null(或 31/12/3000 或您选择设置它),因此在事实表上定义单个 PK 字段可能没有任何好处。 If you needed more than about 5 SKs to identify a fact record then there is probably a case for using a single PK field.如果您需要超过大约 5 个 SK 来识别事实记录,那么可能需要使用单个 PK 字段。
Mini-dimensions: just seem to be more, unnecessary complication but I can't really comment unless you can clearly articulate what the issue is that you think mini-dimensions will solve and why you think mini-dimensions are a solution to the issue迷你维度:似乎更多,不必要的复杂化,但我无法真正发表评论,除非您能清楚地阐明您认为迷你维度将解决的问题以及为什么您认为迷你维度是该问题的解决方案
You seem to be confused about the effective dates on an SDC2 dimension and foreign keys on a Fact table referencing the Date dimension - as they are very different things.您似乎对 SDC2 维度上的有效日期和引用日期维度的事实表上的外键感到困惑 - 因为它们是非常不同的东西。
Date FKs on a Fact are attributes that you have chosen to record for that fact.事实上的日期 FK 是您选择为该事实记录的属性。 In your example, for each bed occupancy fact (ie a single record in your fact table) you might have "Date Occupied" and "Date Vacated" attributes/FKs that reference the Date Dimension.在您的示例中,对于每个床位占用事实(即事实表中的单个记录),您可能具有引用日期维度的“占用日期”和“腾出日期”属性/FK。 When a fact record is created you would populate the "Date Occupied" field with the appropriate date and the "Date Vacated" with "0" (or whatever value points to the "Unknown" record in your Date Dimension).创建事实记录时,您将使用适当的日期填充“占用日期”字段,使用“0”填充“腾出日期”(或指向日期维度中“未知”记录的任何值)。 When the bed becomes unoccupied you update the fact record and set the "Date Vacated" field to the appropriate date.当床位空闲时,您更新事实记录并将“Date Vacation”字段设置为适当的日期。
Because you need to record 2 different dates against the fact, you need to have two FKs referencing the Date dimension;因为您需要根据事实记录 2 个不同的日期,所以您需要有两个引用日期维度的 FK; you couldn't record the Date Occupied and the Date Vacated using a single reference to the Date Dimension.您无法使用对日期维度的单个引用来记录占用日期和腾出日期。
The same type of thinking applies when you want to have an FK on a fact table that references an SCD2 dimension;当您希望在引用 SCD2 维度的事实表上使用 FK 时,同样适用。 you need to decide what the point-in-time context of that reference is and then link to the correct version of the record in the SCD2 dimension.您需要决定该引用的时间点上下文是什么,然后链接到 SCD2 维度中记录的正确版本。 So if you want to record the state of the patient at the point they occupy the bed then you pick their record in the dimension where Fact.DateOccupied between Dim.EffStartDate and Dim.EffEndDate.因此,如果您想记录患者占用床位时的状态,那么您可以在 Dim.EffStartDate 和 Dim.EffEndDate 之间的 Fact.DateOccupied 维度中选择他们的记录。 If you want to also record the date of the patient at a different (but specific) time, such as when the bed was vacated, then you would need to add a separate FK to the fact table to hold this additional reference to the Patient Dim.如果你也想记录病人的日期在不同(但具体的)时间,当床腾空如,那么你就需要一个单独的FK添加到事实表来保存这个额外的参考患者暗淡.
Having populated your fact table, if you want to know the state of the patient at a specific point in time you don't need to do anything to the fact table;填充事实表后,如果您想及时了解患者在特定时间点的状态,则无需对事实表执行任何操作; instead you need to join the Patient Dim to itself.相反,您需要将 Patient Dim 加入到自身中。 eg例如
Pseudo-code SQL for this would look something like (assuming you wanted to know the state of the patient on '2020-11-17'):用于此的伪代码 SQL 看起来类似于(假设您想知道“2020-11-17”上患者的状态):
SELECT
P2.*
FROM
FACT_TABLE F
INNER JOIN PATIENT_DIM P1
ON F.PATIENT_SK = P1.PATIENT_SK
INNER JOIN PATIENT_DIM P2
ON P1.PATIENT_BK = P2.PATIENT_BK
AND P2.EFFSTART_DATE <= '2020-11-17'
AND P2.EFF_END_DATE >= '2020-11-17'
Hope this helps?希望这可以帮助?
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.