简体   繁体   English

如何使用维度中的替代键填充事实表

[英]How to populate fact table with Surrogate keys from dimensions

Could you please help understand how to populate fact table with Surrogate keys from dimensions. 您能否帮助您了解如何使用维度中的代理键填充事实表。

I have the following fact table and dimensions: 我有以下事实表和维度:

ClaimFacts ClaimFacts

ContractDim_SK ClaimDim_SK AccountingDim_SK ClaimNbr ClaimAmount ContractDim_SK ClaimDim_SK AccountingDim_SK ClaimNbr ClaimAmount

ContractDim ContractDim

ContractDim_SK (PK) ContractNbr(BK) ReportingPeriod(BK) Code Name ContractDim_SK(PK)ContractNbr(BK)ReportingPeriod(BK)代码名称

AccountingDim AccountingDim

TransactionNbr(BK) ReportingPeriod(PK) TransactionCode CurrencyCode (Should I add ContractNbr here ?? original table in OLTP has it) TransactionNbr(BK)ReportingPeriod(PK)TransactionCode CurrencyCode(我应该在这里添加ContractNbr吗?

ClaimDim ClaimDim

CalimsDim_Sk(PK) CalimNbr (BK) ReportingPeriod(BK) ClaimDesc ClaimName (Should I add ContractNbr here ?? original table in OLTP has it) CalimsDim_Sk(PK)CalimNbr(BK)ReportingPeriod(BK)ClaimDesc ClaimName(我应该在这里添加ContractNbr吗??? OLTP中的原始表具有它)

My logic to load data into fact table is the following : 我将数据加载到事实表的逻辑如下:

  1. First I load data into dimensions (with Surrogate keys are created as identity columns) 首先,我将数据加载到维度中(使用代理键创建为标识列)
  2. From transactional model (OLTP) the fact table will be filled with the measures (ClaimNbr And ClaimAmount) 在事务模型(OLTP)中,事实表将填充度量(ClaimNbr和ClaimAmount)

  3. I don't know how to populate fact table with SKs of Dimensions, how to know where to put the key I am pulling from dimensions to which row in fact table (which key belongs to this claimNBR ?) Should I add contract Nbr in all dimensions and join them together when loading keys to fact? 我不知道如何用Dimensions的SK填充事实表,如何知道将我从维中拉出的密钥放在事实表的哪一行(哪个密钥属于ClaimNBR?)中,我是否应该在所有合同中添加合同Nbr维度并在加载事实关键点时将它们结合在一起?

What's the right approach to do this? 什么是正确的方法来做到这一点? Please help, Thank you 请帮忙,谢谢

The way it usually works: 通常的工作方式:

  1. In your dimensions, you will have "Natural Keys" (aka "Business Keys") - keys that come from external systems. 在您的维度中,您将拥有“自然键”(也称为“业务键”)-来自外部系统的键。 For example, Contract Number. 例如,合同编号。 Then you create synthetic (surrogat) keys for the table. 然后,您为表创建合成(代理)键。
  2. In your fact table, all keys initially must also be "Natural Keys". 在您的事实表中,所有键最初也必须是“自然键”。 For example, Contract Number. 例如,合同编号。 Such keys must exist for each dimension that you want to connect to the fact table. 对于要连接到事实表的每个维度,此类键必须存在。 Sometimes, a dimension might need several natural keys (collectively, they represent dimension table "Granularity" level). 有时,一个维度可能需要几个自然键(总的来说,它们代表维度表的“粒度”级别)。 For example, Location might need State and City keys if modeled on State-City level. 例如,如果以州-城市级别为模型,则位置可能需要州和城市密钥。
  3. Join your dim table to the fact table on natural keys, and from the result omit natural key from fact and select surrogat key from dim. 将您的暗表连接到自然键的事实表,然后从结果中省略事实的自然键,然后从暗中选择代理键。 I usually do a left join (fact left join dim), to control records that don't match. 我通常执行左连接(事实是左连接暗淡),以控制不匹配的记录。 I also join dims one by one (to better control what's happening). 我也一个接一个地加入昏暗(以更好地控制正在发生的事情)。

Basic example (using T-SQL). 基本示例(使用T-SQL)。 Let's say you have the following 2 tables: 假设您有以下2个表:

Table OLTP.Sales
(   Contract_BK, 
    Amount, 
    Quanity)

Table Dim.Contract
(   Contract_SK,
    Contract_BK,
    Contract Type)

To Swap keys: 交换密钥:

SELECT
     c.Contract_SK
    ,s.Amount
    ,s.Quantity
INTO
    Fact.Sales
FROM
    OLTP.Sales s LEFT JOIN Dim.Contract c ON s.Contract_BK = c.Contract_BK

-- Test for missing keys
SELECT 
    * 
FROM 
    Fact.Sale 
WHERE 
    Contract_SK IS NULL

On a side note, I believe you have some mistakes in your design. 附带一提,我相信您的设计中有一些错误。

  • Report Period should be a separate dimension. 报告期应为单独的维度。 Usually it's a calendar table with all date/period related attributes. 通常,它是一个具有所有日期/期间相关属性的日历表。
  • You certainly should not add ContractNbr to other dimensions. 您当然不应将ContractNbr添加到其他维度。 You already have this data in Contract dimension. 您已经在“合同”维度中拥有此数据。 That's how star schema works - contract attributes are always available to you via fact table. 这就是星型模式的工作原理-合同属性始终可以通过事实表使用。 No need to replicate them. 无需复制它们。
  • I can't say for sure (not enough information) but suspect that dim Accounting and dim Claim might be incorrectly designed. 我不能肯定地说(信息不足),但怀疑昏暗的会计和昏暗的索赔可能设计不正确。 If you intend to list your individual transaction descriptions and individual claim attributes, it's a mistake. 如果您打算列出您的个人交易描述和个人索赔属性,那是一个错误。 It will result in dimensions that are as large as fact table. 这将导致尺寸与事实表一样大。 In a good design, fact table are "tall and skinny", while dimesions are "short and fat". 在一个好的设计中,事实表“又高又瘦”,而尺寸表又“又矮又胖”。 Ie, in a fact table you should have few fields and lots of records, while in dims lots of fields and few records. 即,在事实表中,您应该只有很少的字段和很多记录,而在Dims中则应该有很多字段和很少的记录。 Typically, if your dim's number of records is more than 10-20% of the fact table records, it's an indication of incorrect design. 通常,如果您的dim的记录数超过事实表记录的10-20%,则表明设计不正确。 Correct way of handling this problem is to decompose claims into multiple dimensions, and leave claim number (order number, invoice number, transaction number, etc) as a "degenerate dimension" in your fact table. 解决此问题的正确方法是将索赔分解为多个维度,并在事实表中将索赔号(订单号,发票号,交易号等)保留为“退化维”。 It's a bit of an advanced topic but you clearly need it for your case. 这是一个高级话题,但您显然需要使用它。 Reason why it's important: if your dimensions are as tall as you fact table, you will have increasingly poor performance. 之所以如此重要,是因为:如果您的尺寸与事实表一样高,则性能将越来越差。 If number of trasactions or claims is in the millions of records, it might be so slow that it will kill your design. 如果事务或索赔的数量在数百万条记录中,则它可能太慢以至于会杀死您的设计。

If you need more information on this, I recommend this book: 如果您需要更多有关此方面的信息,建议您本书:

Star Schema The Complete Reference 星型架构完整参考

[Edit to answer a follow-up question]: [编辑以回答后续问题]:

I did not mean to remove ClaimNbr field from Claim dimension. 我并不是要从Claim维度中删除ClaimNbr字段。 I suggested that you don't need such dimension at all. 我建议您根本不需要这样的尺寸。

This might be a bit hard to digest, but consider the following. 这可能很难消化,但请考虑以下内容。 "Claim" is essentially a container for information (same as "Invoice", "Order", etc). “声明”实质上是一个信息容器(与“发票”,“订单”等相同)。 If you move all usefull pieces of data to their relevant dimensions, there should be nothing left but an empty container. 如果将所有有用的数据移到它们的相关维度,则除了空容器外,什么也没有。

For example, let's assume that your OLTP claim table contains the following fields: Claim Number, Report Period, Claim Description, Claim Name, Contract Number, Claim Amount. 例如,假设您的OLTP索赔表包含以下字段:索赔编号,报告期间,索赔说明,索赔名称,合同编号,索赔金额。 You can model them as follows: 您可以对它们进行如下建模:

  • Report Period: becomes business key for "Date" dimension 报告期间:成为“日期”维度的业务关键
  • Contract Number: becomes business key for "Contract" dimension 合同编号:成为“合同”维度的业务关键
  • Claim Amount: stays in fact table as numeric (fully-additive) fact 索赔金额:作为数字(全加)事实存在于事实表中

That leaves 3 fields: Claim Number, Claim Name and Claim Description. 剩下3个字段:索赔编号,索赔名称和索赔说明。 At this point, some designers create dimension "Claim" and park these fields there. 此时,一些设计师创建了尺寸“ Claim”,并将这些字段停放在那里。 As I mentioned before, this is a mistake, because you will then have as many records in your dimension as in your fact table, leading to serious problems. 正如我之前提到的,这是一个错误,因为您在维度中的记录将与事实表中的记录一样多,从而导致严重的问题。

A better design is to leave these fields in the fact table. 更好的设计是将这些字段保留在事实表中。 Claim Number becomes a "Degenerate dimension" - a business key to "empty" (non-existent) dimension. 索赔编号成为“简并维”-“空”(不存在)维的业务关键。 Essentially, it's just an ID for an information container, like invoice number, order number, etc. 本质上,它只是信息容器的ID,例如发票号,订单号等。

Claim Name and Claim Description also should stay in the fact table and become "non-numeric" (non-additive) facts. 索赔名称和索赔说明也应保留在事实表中,并成为“非数字”(非加性)事实。 If you need to display them in a report, it's easy to do, and you can count them, do conditional logic on them, measure their length, etc. 如果您需要在报告中显示它们,这很容易,您可以对它们进行计数,对它们进行条件逻辑处理,测量其长度等。

Another way of looking at this: dimensions are usually used to "slice" (disect) facts BY some attribute/field. 另一种看待此问题的方式是:维度通常用于按某些属性/字段“切片”(剖析)事实。 For example, "Sale Amount by Country", "Product Costs by Plant Location", etc. But you can't slice by descriptions, notes, or other free text - it makes no sense. 例如,“按国家/地区划分的销售额”,“按工厂位置划分的产品成本”等。但是,您无法按说明,注释或其他自由文本进行切片-这没有任何意义。

What if your descriptions or other claim attributes are structured? 如果您的描述或其他声明属性是结构化的怎么办? For example, if they are used to categorize/classify your claims? 例如,是否使用它们对您的索赔进行分类/分类? In that case, they are not a free text, they are an attribute that belongs to a dimension. 在这种情况下,它们不是自由文本,而是属于维度的属性。 For example, you can design dimension "Claim Type". 例如,您可以设计尺寸“索赔类型”。 Or "Claim Status". 或“声明状态”。 Etc. If there are too many of these little attribute fileds, you can combine them into what's called a "junk" dimension (aka "Profile" dimension), ie, dimension "Claim Profile". 等等。如果这些小属性文件太多,则可以将它们组合为所谓的“垃圾”维度(也称为“配置文件”维度),即“索赔配置文件”维度。 Such designs are clean and efficient. 这样的设计是干净有效的。

Read more on junk dimensions here 在这里阅读更多有关垃圾尺寸的信息

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM