简体   繁体   English

具有事实表实施的 SCD2

[英]SCD2 WITH FACT TABLE IMPLEMENTATION

I am asked to build a client dimension and a bed dimension .我被要求建立一个客户维度和一个床维度。

and bring them together in the sense of clientID-SK,bedID_SK,Bed_begin_date,bed_end-date.Both tables contains SCD1, and SC2 fields.How do I implement this if the dates the clients was and out off bed and out has nothing to do with what defines as a client or bed(types).并将它们从 clientID-SK、bedID_SK、Bed_begin_date、bed_end-date 的意义上组合在一起。两个表都包含 SCD1 和 SC2 字段。如果客户端的日期和下床日期无关,我该如何实现这一点定义为客户或床(类型)。

I have been able to combine them but my challenge is that when I load them into a fact table the table only has the begin_date .How will I update the fact table end_date which is suppose to = the begin_date of the next bed assignment.我已经能够将它们组合起来,但我的挑战是,当我将它们加载到事实表中时,该表只有 begin_date 。我将如何更新事实表 end_date,它假定 = 下一个床位分配的 begin_date。

eg clientID,bedID,Start_Date,End_Date 10 ,ROO1, ,01-19-2020, 3000-01-01 00:00:00.000例如 clientID,bedID,Start_Date,End_Date 10 ,ROO1, ,01-19-2020, 3000-01-01 00:00:00.000

Dimension 10 ,ROO1, ,01-19-2020, 10-19-2020 10 ,ROO2, ,10-19-2020, 3000-01-01 00:00:00.000维度 10 ,ROO1, ,01-19-2020, 10-19-2020 10 ,ROO2, ,10-19-2020, 3000-01-01 00:00:00.000

We have a table called current bed that keeps track of our current client and I was able to build a slowly changing dimension off that table.我们有一个名为 current bed 的表,用于跟踪我们当前的客户,我能够在该表上构建一个缓慢变化的维度。

But we are concerned to follow standard practice we have to have a star schema in place .但是我们担心遵循标准做法,我们必须有一个星型模式。

Any suggestion任何建议

So you have, at least, the following tables:所以你至少有以下表格:

  1. Client Dimension holding all the client attributes包含所有客户端属性的客户端维度
  2. Bed Dimension holding all the Bed attributes Bed Dimension 包含所有 Bed 属性
  3. A Date Dimension日期维度
  4. A Bed Occupancy Fact with FKs to Client Dim, Bed Dim and 2 FKs to Date Dim (one for Bed occupied and one for bed vacated)床位占用情况,包括 FKs to Client Dim、Bed Dim 和 2 FKs to Date Dim(一个用于床位占用,一个用于床位腾出)

When a bed is first occupied by a client you create a new fact record and populate the Client, Bed and Date Occupied FKs.当床位首次被客户占用时,您创建一个新的事实记录并填充客户、床位和日期占用 FK。 You populate the Bed Vacated with 0 (or whatever key value you have used in the Date Dim to indicate the 'unknown' record).您使用 0(或您在 Date Dim 中使用的任何键值来指示“未知”记录)填充 Bed Vavacated。

When a bed is next occupied, you create a new fact record for the new client and update the Bed Vacated FK on the previous record with the relevant Date key.当床位下次有人使用时,您为新客户创建一个新的事实记录,并使用相关的日期键更新先前记录上的 Bed Vacated FK。

A few things to think about:需要考虑的几件事:

  1. Are you only working at the Date level of granularity or at Time level ie are you interested in what time of day (or morning/afternoon, etc.) when a bed was occupied/vacated?您是仅在日期级别的粒度还是在时间级别工作,即您是否对床位被占用/腾空的时间(或早上/下午等)感兴趣?
  2. I would ensure that the Date Vacated of the previous occupancy and the Date Occupied of the current one are not the same value otherwise you can get double counting on that overlapping date unless you start implementing logic to prevent it.我会确保前一个入住的日期空出和当前入住的日期不相同,否则您可能会对该重叠日期进行重复计算,除非您开始实施逻辑来防止它。 For example, if a bed is occupied on the 25th Sept then set the Vacated date of the previous record to 24th Sept例如,如果床位在 9 月 25 日被占用,则将之前记录的空出日期设置为 9 月 24 日
  3. Can you have periods when a bed is unoccupied?你有没有床位的时期? If you can, then I would create a fact record for this in exactly the same way as you would for an occupied bed but set the client ID FK to 0 (or whatever value you use in the client Dim to indicate a "not applicable" client)如果可以,那么我会以与占用床位完全相同的方式为此创建一个事实记录,但将客户端 ID FK 设置为 0(或您在客户端 Dim 中使用的任何值来表示“不适用”)客户)

Hope this helps?希望这可以帮助?

Update 1 following response更新 1 以下响应

If you need to include Time then you need a time dimension and 2 additional keys in the fact for occupied and vacated time.如果您需要包括时间,那么您需要一个时间维度和 2 个额外的键来表示已占用和空闲时间。

I'm not sure I understand your question about how you update the fact table.我不确定我是否理解您关于如何更新事实表的问题。 You have the information required to identify the fact record (bed id and vacated date key = 0) and the value needed to update the fact record.您拥有识别事实记录所需的信息(床位 ID 和腾出日期键 = 0)以及更新事实记录所需的值。 What am I missing?我错过了什么?

UPDATE 2更新 2

I think you need to take a step back and think clearly about what it is you are trying to achieve - then the answers to your questions should become more obvious.我认为你需要退后一步,清楚地思考你想要达到的目标——然后你的问题的答案应该变得更加明显。

The first question you need to ask is what are you trying to measure: once you have clearly defined that then the grain of the fact table is established and it becomes clearer what changes in attributes you need to handle.您需要问的第一个问题是您要测量什么:一旦您明确定义了这一点,那么事实表的粒度就建立了,并且您需要处理的属性变化变得更加清晰。 For example:例如:

  1. If you just want to know the status of a bed every time the occupant changes, and only the status of the occupant when they first use the bed (or last use the bed), then you only need to add a fact record when the bed occupancy changes and there is no need to record any updates during that patient's occupancy如果你只是想知道每次床位变化时床位的状态,并且只想知道第一次使用床位时(或最后一次使用床位时)床位的状态,那么您只需要在床位时添加一个事实记录入住率发生变化,无需在该患者入住期间记录任何更新
  2. If you want to know the state of of the bed at any point in time then first you need to define what you mean by "any point in time": every day, hour, minute, etc?如果您想在任何时间点了解床的状态,那么首先您需要定义“任何时间点”的含义:每天、每小时、每分钟等? Then you need to decide what you want to record if there are multiple changes in that time period ie the position at the start of the hour or the end of the hour.然后,您需要决定如果该时间段内有多个更改,即每小时开始或结束时的位置,您要记录什么。 Based on these decisions, you then need to work out if there have been any changes during that time period and, if there have been, insert/update the relevant records根据这些决定,您需要确定在该时间段内是否有任何更改,如果有,请插入/更新相关记录
  3. If you want to treat each patient's occupancy of a bed as a single fact then your fact record obviously has start and end dates but you also need to make the decision about which single state you are going to record for any attributes that can change over that period - you can record the patient's status at the start or end of the occupancy but not throughout the occupancy as that would affect the grain of the fact table如果您想将每位患者的床位占用情况视为单个事实,那么您的事实记录显然具有开始日期和结束日期,但您还需要决定要为任何可能改变的属性记录哪个单一状态期间 - 您可以在入住开始或结束时记录患者的状态,但不能在整个入住期间记录患者的状态,因为这会影响事实表的粒度

So to try and answer your questions...因此,尝试回答您的问题...

If there is a change in dimension attributes and it affects your fact table then you'll need to handle this eg by inserting or updating a fact record:如果维度属性发生更改并影响您的事实表,则您需要处理此问题,例如通过插入或更新事实记录:

  • If you are only interested in the state of the patient at the start or end of the occupancy then any change to the patient's attributes during the occupancy can be ignored如果您只对入住开始或结束时患者的状态感兴趣,则可以忽略入住期间对患者属性的任何更改
  • If you are interested in the state of the patient at any point in the occupancy then you'll need to make changes to the fact table whenever one of the patient's attributes changes如果您对入住期间的任何时间点的患者状态感兴趣,那么只要患者的属性之一发生变化,您就需要对事实表进行更改

Records in your fact table should never overlap each other - so at any point in time there is only one active fact record per bed and per patient.事实表中的记录不应相互重叠 - 因此在任何时候,每张病床和每位患者只有一个活动事实记录。 Each time you insert a new fact record you would expire the previous applicable fact record.每次插入新的事实记录时,前一个适用的事实记录都会过期。

So when you ask "The update to the end_date when the client moves to a new bed will be on all 3 added surrogate key rows?"因此,当您问“客户搬到新床时对 end_date 的更新将在所有 3 个添加的代理键行上?” , the answer is no - you would have set the end date on the first 2 records when you created the next record each time ie set the end date of record 1 when you create record 2, set the end date of record 2 when you create record 3, etc.; ,答案是否定的 - 您每次创建下一条记录时都会在前 2 条记录上设置结束日期,即在创建记录 2 时设置记录 1 的结束日期,在创建时设置记录 2 的结束日期记录3等; so you will only be updating the last record when the client moves.所以你只会在客户端移动时更新最后一条记录。

Adding a PK to a fact table is only required when there is a requirement to update the fact table - as is the case here.仅当需要更新事实表时才需要向事实表添加 PK - 就像这里的情况。 Whether you do so is a choice - but I would look at how complicated the compound key is ie how many SKs do you need to use to identify the correct fact record to be updated.您是否这样做是一个选择 - 但我会看看复合键有多复杂,即您需要使用多少个 SK 来识别要更新的正确事实记录。 In you case you only need the Bed SK and the end_date = null (or 31/12/3000 or however you have chosen to set it) so there is probably no benefit in defining a single PK field on the fact table.在您的情况下,您只需要 Bed SK 和 end_date = null(或 31/12/3000 或您选择设置它),因此在事实表上定义单个 PK 字段可能没有任何好处。 If you needed more than about 5 SKs to identify a fact record then there is probably a case for using a single PK field.如果您需要超过大约 5 个 SK 来识别事实记录,那么可能需要使用单个 PK 字段。

UPDATE 3 - following comment added on 17/11/2020更新 3 - 以下评论添加于 17/11/2020

Mini-dimensions: just seem to be more, unnecessary complication but I can't really comment unless you can clearly articulate what the issue is that you think mini-dimensions will solve and why you think mini-dimensions are a solution to the issue迷你维度:似乎更多,不必要的复杂化,但我无法真正发表评论,除非您能清楚地阐明您认为迷你维度将解决的问题以及为什么您认为迷你维度是该问题的解决方案

Dates日期

You seem to be confused about the effective dates on an SDC2 dimension and foreign keys on a Fact table referencing the Date dimension - as they are very different things.您似乎对 SDC2 维度上的有效日期和引用日期维度的事实表上的外键感到困惑 - 因为它们是非常不同的东西。

Date FKs on a Fact are attributes that you have chosen to record for that fact.事实上的日期 FK 是您选择为该事实记录的属性。 In your example, for each bed occupancy fact (ie a single record in your fact table) you might have "Date Occupied" and "Date Vacated" attributes/FKs that reference the Date Dimension.在您的示例中,对于每个床位占用事实(即事实表中的单个记录),您可能具有引用日期维度的“占用日期”和“腾出日期”属性/FK。 When a fact record is created you would populate the "Date Occupied" field with the appropriate date and the "Date Vacated" with "0" (or whatever value points to the "Unknown" record in your Date Dimension).创建事实记录时,您将使用适当的日期填充“占用日期”字段,使用“0”填充“腾出日期”(或指向日期维度中“未知”记录的任何值)。 When the bed becomes unoccupied you update the fact record and set the "Date Vacated" field to the appropriate date.当床位空闲时,您更新事实记录并将“Date Vacation”字段设置为适当的日期。

Because you need to record 2 different dates against the fact, you need to have two FKs referencing the Date dimension;因为您需要根据事实记录 2 个不同的日期,所以您需要有两个引用日期维度的 FK; you couldn't record the Date Occupied and the Date Vacated using a single reference to the Date Dimension.您无法使用对日期维度的单个引用来记录占用日期和腾出日期。

The same type of thinking applies when you want to have an FK on a fact table that references an SCD2 dimension;当您希望在引用 SCD2 维度的事实表上使用 FK 时,同样适用。 you need to decide what the point-in-time context of that reference is and then link to the correct version of the record in the SCD2 dimension.您需要决定该引用的时间点上下文是什么,然后链接到 SCD2 维度中记录的正确版本。 So if you want to record the state of the patient at the point they occupy the bed then you pick their record in the dimension where Fact.DateOccupied between Dim.EffStartDate and Dim.EffEndDate.因此,如果您想记录患者占用床位时的状态,那么您可以在 Dim.EffStartDate 和 Dim.EffEndDate 之间的 Fact.DateOccupied 维度中选择他们的记录。 If you want to also record the date of the patient at a different (but specific) time, such as when the bed was vacated, then you would need to add a separate FK to the fact table to hold this additional reference to the Patient Dim.如果你想记录病人的日期在不同(但具体的)时间,当床腾空如,那么你就需要一个单独的FK添加到事实表来保存这个额外的参考患者暗淡.

Having populated your fact table, if you want to know the state of the patient at a specific point in time you don't need to do anything to the fact table;填充事实表后,如果您想及时了解患者在特定时间点的状态,则无需对事实表执行任何操作; instead you need to join the Patient Dim to itself.相反,您需要将 Patient Dim 加入到自身中。 eg例如

  1. The fact table holds an FK that references a record in the Patient Dim事实表包含一个 FK,该 FK 引用了 Patient Dim 中的记录
  2. From this Patient Dim record you can get the patient's BK从这个 Patient Dim 记录中,您可以得到患者的 BK
  3. Join from this BK back to the Patient Dim and filter on the date that you want to get the patient's details for从此 BK 返回到 Patient Dim 并过滤您想要获取患者详细信息的日期

Pseudo-code SQL for this would look something like (assuming you wanted to know the state of the patient on '2020-11-17'):用于此的伪代码 SQL 看起来类似于(假设您想知道“2020-11-17”上患者的状态):

SELECT
    P2.*
FROM
    FACT_TABLE F
    INNER JOIN PATIENT_DIM P1
        ON F.PATIENT_SK = P1.PATIENT_SK
    INNER JOIN PATIENT_DIM P2
        ON  P1.PATIENT_BK         = P2.PATIENT_BK
            AND P2.EFFSTART_DATE <= '2020-11-17'
            AND P2.EFF_END_DATE  >= '2020-11-17'

Hope this helps?希望这可以帮助?

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM