简体   繁体   English

具有不同更新时间表的事实表

[英]Fact Table with Different Update Schedules

I have two sets of data with the same level of grainularity, for example invoice number. 我有两组具有相同粒度级别的数据,例如发票编号。 Most of the data required is updated daily as we recognize the revenue for previous invoices. 由于我们确认了以前发票的收入,因此所需的大多数数据每天都会更新。 However, some of this data is fed through a seperate costing system once a month and is then fed to the data warehouse with additional information. 但是,某些数据每个月通过单独的成本核算系统提供一次,然后与其他信息一起提供给数据仓库。 Should I create one fact table that contains both sets of data, and then run an update on the fact table once a month when the other data is imported in, or should I create two fact tables because of the different update schedule. 我应该创建一个包含两个数据集的事实表,然后在导入其他数据时每月对事实表运行一次更新,还是应该由于更新时间表不同而创建两个事实表。 The data is related, and many queries (~35%) will want information from both sets of data (when avaliable). 数据是相关的,并且许多查询(约35%)都希望从两组数据中获得信息(如果可用)。 The system imports 30,000 rows a day into the fact table has about 38,000,000 rows in it, the monthly update would affect 660,000 rows. 系统每天将30,000行导入事实表,其中有38,000,000行,每月更新将影响660,000行。

Providing that already existing measures are not modified in the second step, you could treat the fact table as an "accumulating snapshot". 如果第二步中没有修改已经存在的度量,则可以将事实表视为“累积快照”。 The table describes processes with a definitive start and the end -- kind of workflows. 下表描述了具有明确开始和结束流程的工作流程。 Look it up in Kimball's Data Warehouse Toolkit or just Google "Kimball accumulating snapshot fact table". 在Kimball的数据仓库工具包中或在Google“ Kimball累积快照事实表”中查找它。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM