简体   繁体   English

数据仓库中的交叉引用事实和维度

[英]Cross reference Facts and Dimensions in Data warehouse

I am trying to design a data warehouse for a licensing vendor, who sells licenses on ecommerce and various other venues.我正在尝试为许可供应商设计一个数据仓库,该供应商在电子商务和其他各种场所销售许可。 The things they want to track are sales, product lifecycle and activity.他们想要跟踪的是销售、产品生命周期和活动。 What this means is that there are different sale types (such as new purchase, promotional purchase, renewal) and different events/states of a license, such as - a license can get installed, renewed, activated, registered.这意味着存在不同的销售类型(例如新购买、促销购买、续订)和许可证的不同事件/状态,例如 - 可以安装、续订、激活、注册许可证。 A license can get renewed many times (on different dates).许可证可以多次更新(在不同的日期)。

So I was thinking my dimensions would be very simple - date, product, source, saletype and event/state.所以我想我的维度会非常简单——日期、产品、来源、销售类型和事件/状态。 I would have two fact tables;我会有两个事实表; one would be for sales, and another would be for the events, both of them having foreign keys to the dimension tables.一个用于销售,另一个用于事件,它们都具有维度表的外键。 My fact tables would be an accumulating fact table, where every event would add a new row - hence, the licenses can be repeated.我的事实表将是一个累积事实表,其中每个事件都会添加一个新行 - 因此,可以重复许可证。 However, the requirements states that they be able to cross reference these two facts and the saletype and event dimensions.但是,要求声明他们能够交叉引用这两个事实以及 saletype 和 event 维度。 For example, If someone sees that product 'A' has 100 sales in the US ecommerce store of type 'new purchase', then they want to see how many of 'those' 100 licenses also got activated... and then maybe they would want to see, out of the people that activated, how many have registered... and then (back to saletype) of how many of those that registered, how many of them 'renewed'.例如,如果有人看到产品“A”在美国电子商务商店的“新购买”类型有 100 次销售,那么他们想看看“那些”100 个许可证中有多少也被激活了......然后他们可能会想看看,在激活的人中,有多少人注册了……然后(回到 saletype)有多少人注册了,有多少人“更新了”。 And I cannot really define a heirarchy, because you could have a whole lot of combinations of these....而且我不能真正定义层次结构,因为你可以有很多这些组合......

How can I do this?我怎样才能做到这一点? As I'm reading, I find there seems to be no way to relate the two facts based on the license itself (which is what I need to do).当我阅读时,我发现似乎没有办法根据许可证本身来关联这两个事实(这是我需要做的)。

Also, I was also thinking that maybe I can have 1 fact table, and I can 'technically' combine the saletype and the eventtype into a big eventtype dimension.另外,我还在想也许我可以有 1 个事实表,并且我可以“技术上”将 saletype 和 eventtype 组合成一个大的 eventtype 维度。 So, then in the fact table would be a big transaction fact table, which will have an eventid foreign key to the events dimension.因此,事实表中将是一个大的事务事实表,其中将具有事件维度的 eventid 外键。 But still, so now I have a fact table, with a row for every event that happens to a license.但是,所以现在我有一个事实表,其中包含发生在许可证上的每个事件的一行。 The license is repeated, and one event can appear for an event more than once (on different dates).许可证是重复的,一个事件可以为一个事件出现多次(在不同的日期)。 So, if I choose to see the totals for an event, how can I see how many of those licenses also exist for a different event?那么,如果我选择查看某个事件的总数,我如何才能看到不同事件中也存在多少这些许可证?

I need to provide all these numbers as measures, so that a business user can see them on the fly (using whatever OLAP browser they want to use)我需要提供所有这些数字作为度量,以便业务用户可以即时查看它们(使用他们想要使用的任何 OLAP 浏览器)

note: I am using SQL server analysis services and SQL server 2008 r2注意:我使用的是 SQL 服务器分析服务和 SQL 服务器 2008 r2

Just as a reference, this is what I have now:作为参考,这就是我现在所拥有的:

  1. DimProducts (PK: ProductID, and other attributes) DimProducts(PK:ProductID 和其他属性)
  2. DimDate (PK: DateKey, and other attributes) DimDate(PK:DateKey 和其他属性)
  3. DimEvent (PK: EventID, and oither attributes) DimEvent(PK:EventID 和其他属性)

  4. FactLicenses(FK: ProductID; FK: DateKey; FK: EventID, and License Field(varchar)) FactLicenses(FK:ProductID;FK:DateKey;FK:EventID 和许可证字段(varchar))

So I have a license repeated, with an event for every time something happens to the license (installed, activated, renewed, cancelled, renewed (again). It is possible there is one license with the same eventID, but never on the same DateKey. The primary key of the table is DateKey + EventID + License因此,我重复了一个许可证,每次许可证发生问题时都会发生一个事件(安装、激活、续订、取消、续订(再次)。可能有一个具有相同 eventID 的许可证,但从不在同一个 DateKey 上. 表的主键为 DateKey + EventID + License

EDIT:编辑:

So, I've read in many places that the fact table in a situation like this should be an accumulating fact table, which has multiple columns pointing to the same (type) of dimension - (ie date) and that I should create role playing dimension for each one of those.所以,我在很多地方读到,在这种情况下的事实表应该是一个累积的事实表,它有多个指向相同(类型)维度(即日期)的列,我应该创建角色扮演每一个的维度。 But How do you account for the fact that a license can get renewed multiple times, and can get installed multiple times, etc...?但是,您如何解释许可证可以多次续订,并且可以多次安装等事实......?

I've since gone back to Ralph Kimball's book, and found a case study that can solve this issue for me.从那以后,我回到了 Ralph Kimball 的书,并找到了一个可以为我解决这个问题的案例研究。 I've also merged the sale type and event types into one major group.我还将销售类型和事件类型合并为一个主要组。 So given that, there are still two groups of things - things that can happen to a license once, vs things that can happen to a license multiple times.因此,考虑到这一点,仍然有两组事情——许可证可能发生一次的事情,以及许可证可能多次发生的事情。 Everything that can happen to a license once is now stored in an accumulating fact table.许可证曾经可能发生的所有事情现在都存储在累积事实表中。 Everything that can happen to a licene multiple times is then stored in a different table (a different table for each entity or 'type' of event that can happen).然后将可能多次发生在许可证上的所有内容存储在不同的表中(每个实体或可能发生的事件“类型”的不同表)。

This effectively solved the problem for me, because in analysis services, I am now able to make something called 'referenced' relationship, where the relationship is the 'license'.这有效地解决了我的问题,因为在分析服务中,我现在能够建立一种称为“引用”关系的东西,其中关系是“许可证”。 So any of my dimensions that are related to the different table can be linked via the original accumulating fact table (that has the license column).因此,与不同表相关的任何维度都可以通过原始累积事实表(具有许可证列)链接。

Thanks for your input, whoever has tried to answer.感谢您的意见,谁试图回答。

I think your design already accommodates this type of analysis, though really your situation is comprised of two queries.我认为您的设计已经适应了这种类型的分析,尽管您的情况实际上是由两个查询组成的。

The first would be if you wanted to find out the number and value of sales by summing values in the SALES fact table for product 'A' and source 'USA'.第一个是如果您想通过对产品“A”和来源“美国”的销售事实表中的值求和来找出销售的数量和价值。 For example:例如:

SELECT COUNT(*) TOTAL_UNIT_SALES, SUM(FCT_SALES.VALUE) TOTAL_VALUE
FROM   FCT_SALES, DIM_PRODUCTS, DIM_SOURCES
WHERE  FCT_SALES.PRODUCT_FK = DIM_PRODUCTS.PRODUCT_SK
AND    DIM_PRODUCTS.NAME = 'A'
AND    FCT_SALES.SOURCE_FK = DIM_SOURCES.SOURCE_SK
AND    DIM_SOURCES.NAME = 'USA';

The second would pivot or sum records in the EVENTS fact table for the same set of dimensional foreign keys, to find how how many events occurred of each type.第二个是 pivot 或对同一组维度外键的 EVENTS 事实表中的记录求和,以找出每种类型发生了多少事件。 For example:例如:

SELECT SUM(CASE WHEN DIM_SALE_TYPES.NAME = 'NEW' THEN 1 ELSE 0 END) TOTAL_NEW_SALES
,      SUM(CASE WHEN DIM_SALE_TYPES.NAME = 'ACTIVATION' THEN 1 ELSE 0 END) TOTAL_ACTIVATIONS
,      SUM(CASE WHEN DIM_SALE_TYPES.NAME = 'REGISTRATION' THEN 1 ELSE 0 END) TOTAL_REGISTRATIONS
FROM   FCT_EVENTS, DIM_PRODUCTS, DIM_SOURCES, DIM_SALE_TYPES
WHERE  FCT_EVENTS.PRODUCT_FK = DIM_PRODUCTS.PRODUCT_SK
AND    DIM_PRODUCTS.NAME = 'A'
AND    FCT_EVENTS.SOURCE_FK = DIM_SOURCES.SOURCE_SK
AND    DIM_SOURCES.NAME = 'USA'
AND    FCT_EVENTS.SALE_TYPE_FK = DIM_SALE_TYPES.SALE_TYPE_SK;

I would strongly suggest adding license as a separate dimension.我强烈建议将许可证添加为一个单独的维度。 Can it be associated with some unique identifier, say, license number or activation key?它可以与某些唯一标识符相关联,例如许可证号或激活密钥吗?

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM