简体   繁体   English

慢慢改变事实表?

[英]Slowly Changing Fact Table?

Background) I've gone through the process of building a fact table for our inventory data that will in theory act as a nightly snapshot of our warehouse. 背景)我已经完成了为我们的库存数据构建事实表的过程,理论上它将作为我们仓库的夜间快照。 What is recorded is information such as quantity, weights, locations, statuses, etc. The data is very granular and in many cases not specifically related to a single entity (our source database records inventory data as having three primary keys: licenseplate aka pallet, product, and packaging type - so it has essentially 3 business keys and no surrogate key). 记录的是数量,重量,位置,状态等信息。数据非常精细,在许多情况下与单个实体无关(我们的源数据库将库存数据记录为具有三个主键:licenseplate aka pallet,产品和包装类型 - 因此它基本上有3个业务键,没有代理键。

The goal is to be able to have a 100% accurate recreation of our warehouse management system's data, that is viewable for any one day in history. 我们的目标是能够100%准确地重建我们的仓库管理系统的数据,这在历史上的任何一天都是可见的。 So I can look up and see how many pallets of product XYZ was in location 1234 on the 4th of August. 所以我可以查看8月4日在1234号位置生产的XYZ产品的托盘数量。

Question 1) Now, I have built this fact table to structurally look like a Slowly Changing Dimension, Type 2. Is this wrong? 问题1)现在,我已经构建了这个事实表,在结构上看起来像一个慢慢变化的维度,类型2.这是错的吗? I've been reading up a little on accumulating snapshot fact tables and i'm beginning to question my design. 我一直在阅读有关累积快照事实表的内容,我开始质疑我的设计。 What is the best practice in this situation? 在这种情况下,最佳做法是什么?

Question 2) If my design is ok, how do I configure Analysis services so that it recognizes my DateStart and DateEnd columns in the FACT table? 问题2)如果我的设计没问题 ,如何配置Analysis服务以便它识别FACT表中的DateStart和DateEnd列? I have found some information on how to configure this for dimensions but it does not seem to work/apply to fact tables. 我找到了一些有关如何为维度配置此信息的信息,但它似乎不适用于事实表。

For reference - My fact table's structure (with added notes about columns): 供参考 - 我的事实表的结构(添加了关于列的说明):

CREATE TABLE [dbo].[FactInventory](     
[id] [int] IDENTITY(1,1) NOT NULL,  (fact table only surrogate key)
[DateStart] [datetime] NULL,    (record begin date)
[DateEnd] [datetime] NULL,       (record end date)
[CreateDate] [datetime] NULL,    (create date of the inventory record in src db)
[CreateDateId] [int] NULL,       (create date dimension key)
[CreateTimeId] [int] NULL,       (create time dimension key)
[LicensePlateId] [int] NULL,     (pallet id dimension key)
[SerialNumberId] [int] NULL,     (serial number id dimension key)
[PackagedId] [int] NULL,         (packaging type id dimension key)
[LotId] [int] NULL,          (inventory lot id dimension key)
[MaterialId] [int] NULL,         (product id dimension key)
[ProjectId] [int] NULL,          (customer project id dimension key)
[OwnerId] [int] NULL,        (customer id dimension key)
[WarehouseId] [int] NULL,     (warehouse id dimension key)
[LocationId] [int] NULL,      (location id dimension key)
[LPStatusId] [int] NULL,      (licenseplate status id dimension key)
[LPTypeId] [int] NULL,    (licenseplate type id dimension key)
[LPLookupCode] [nvarchar](128) NULL, (licenseplate non-system name)
[PackagedAmount] [money] NULL,  (inventory amount - measure)
[netWeight] [money] NULL,   (inventory netWeight - measure)
[grossWeight] [money] NULL, (inventory grossWeight - measure)
[Archived] [bit] NULL,  (inventory archived yes/no - dimension)
[SCDChangeReason] [nvarchar](128) NULL (auditing data for changes)

Typically, in a snapshot fact table you do not have changes. 通常,在快照事实表中,您没有更改。

You usually have a date/time dimension which is used for the granularity of the measurements and not a DateStart/DateEnd. 您通常有一个日期/时间维度,用于测量的粒度而不是DateStart / DateEnd。 Similarly you do not have any SCD information. 同样,您没有任何SCD信息。 The fact snapshot is taken and the Date and Time dimensions are attached to those facts. 拍摄事实快照,并将日期和时间维度附加到这些事实。 If those facts repeat identically each month, so be it. 如果这些事实每个月都重复相同,那就这样吧。

Dealing with determining which facts are valid at a given time is more processing than you really want your DW or your ETL to handle - that kind of design (effective dates, etc) is more effectively used in a live OLTP-type system where complete history is kept in the live system. 处理在给定时间确定哪些事实有效的处理比您真正希望DW或ETL处理的处理更多 - 在具有完整历史的实时OLTP类型系统中更有效地使用这种设计(生效日期等)保存在实时系统中。 The point of the DW is to optimize for reporting, not for space, and thus there is a direct snapshot date/time dimension which allows you to easily index and potentially partition the data without a lot of date arithmetic or comparisons. DW的目的是优化报告,而不是空间,因此有一个直接快照日期/时间维度,允许您轻松索引和可能对数据进行分区,而无需进行大量的日期算术或比较。

As far as your dimensional model, be careful that you aren't succumbing to the too-many dimensions problem. 就你的维度模型而言,要小心你不会屈服于太多的维度问题。 Remember that dimensions do not have to correspond to entities in the real world. 请记住,维度不必与现实世界中的实体相对应。 The choice of how dimensional attributes are grouped into dimension tables should be informed by 1) query needs, 2) data affinity and change behavior, 3) business organization. 如何将维度属性分组到维度表中的选择应该通过以下方式告知:1)查询需求,2)数据亲和性和变更行为,3)业务组织。 You might want to look into using one or more junk dimensions. 您可能希望研究使用一个或多个垃圾维度。

Before going any further, is inventory really a slowly changing fact? 在继续前进之前,库存真的是一个缓慢变化的事实吗?

Edit: Then why not just snapshot every product each day, since that's what you want. 编辑:那么为什么不每天只为每个产品拍照,因为这就是你想要的。

The problem is that fact tables get large and you're throwing EVERYTHING into the fact table unnecessarily. 问题是事实表变大了,你不必要地把所有东西扔进事实表。 Ideally, the fact table will contain nothing more than foreign keys to dimensions and data only pertaining to the fact at hand. 理想情况下,事实表只包含维度的外键和仅与手头的事实有关的数据。 But some of the columns you've outlined look like they belong in one of the dimensions tables whereas 但是你概述的一些列看起来像是属于其中一个维度表而是

For instance, the license plate information. 例如,车牌信息。 Status, type, and lookup code. 状态,类型和查找代码。 Likewise with netWeight/grossWeight. 与netWeight / grossWeight相同。 They should be derivable from the product dimension and PackagedAmount. 它们应该来自产品维度和PackagedAmount。

CREATE TABLE [dbo].[FactInventory](     
[id] [int] IDENTITY(1,1) NOT NULL,  (fact table only surrogate key)
[day] [int] NULL,                (day dimension key, grain of a day)
[CreateDateId] [int] NULL,       (create date dimension key)
/* I take these are needed?
 * [CreateTimeId] [int] NULL,       (create time dimension key)
 * [CreateDate] [datetime] NULL,    (create date of the inventory record in src db)
 */
[LicensePlateId] [int] NULL,     (pallet id dimension key)
/* Now THESE dimension columns...possibly slowly changing dimensions?
[LPStatusId] [int] NULL,             (licenseplate status id dimension key)
[LPTypeId] [int] NULL,               (licenseplate type id dimension key)
[LPLookupCode] [nvarchar](128) NULL, (licenseplate non-system name)
*/
[SerialNumberId] [int] NULL,     (serial number id dimension key)
[PackagedId] [int] NULL,         (packaging type id dimension key)
[LotId] [int] NULL,              (inventory lot id dimension key)
[MaterialId] [int] NULL,         (product id dimension key)
[ProjectId] [int] NULL,          (customer project id dimension key)
[OwnerId] [int] NULL,            (customer id dimension key)
[WarehouseId] [int] NULL,        (warehouse id dimension key)
[LocationId] [int] NULL,         (location id dimension key)
[PackagedAmount] [money] NULL,   (inventory amount - measure)
[netWeight] [money] NULL,        (inventory netWeight - measure)
[grossWeight] [money] NULL,      (inventory grossWeight - measure)
[Archived] [bit] NULL,           (inventory archived yes/no - dimension)
[SCDChangeReason] [nvarchar](128) NULL (auditing data for changes)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM