简体   繁体   English

数据仓库设计中最常见的粒度

[英]Most common granularities in datawarehouse designs

I have been looking for the answer of this question for a while: 我一直在寻找这个问题的答案一段时间:

When asking about granularity, the immediate examples given are: transaction, day, week, month etc. I couldn't find any other type of example. 当询问粒度时,给出的直接示例是:事务,日,周,月等。我找不到任何其他类型的示例。 For instance, could we consider 'city', 'state' etc. also granularity? 例如,我们是否可以将“城市”,“州”等也视为粒度? when, for example, we consider sales for a nationwide company? 例如,我们何时考虑在全国范围内销售公司? In other words, is granularity always something of the type of time? 换句话说,粒度始终是某种时间类型吗?

No, granularity is not always related to time. 不,粒度并不总是与时间有关。 Your lowest granularity will often be some kind of transaction. 您的最低粒度通常是某种事务。 One of the examples Kimball uses is from a retail setting: the lowest granularity relating to product sales might be an item being scanned at the check-out. Kimball使用的示例之一是来自零售店:与产品销售有关的最低粒度可能是在结帐时正在扫描的物品。 Two such transactions could happen at the same moment, so this is not a time-based granularity. 可能会同时发生两个这样的事务,因此这不是基于时间的粒度。

Just about anything could be the granularity of a table, but Kimball advises working to the lowest granularity as this is far more flexible - you can then slice and dice your data in more ways. 表格的粒度几乎可以是任何东西,但是Kimball建议将粒度降至最低,因为这要灵活得多-然后,您可以通过更多方式对数据进行切片和切块。 You might choose to have some aggregated tables where you sum data up to Week level, or State level, or pretty much anything else (possibly for performance reasons, or to make it easier for certain users) - but these are unlikely to be your lowest granularity. 您可能会选择具有一些汇总表,在这些汇总表中,您可以将数据汇总到周级别,州级别或几乎所有其他内容(可能是出于性能方面的考虑,或者是为了使某些用户更容易使用)-但这些数据可能不是最低的粒度。

Using State as an example - you presumably have lower level information within the same hierarchy that you could analyse sales data by, like county, city, ZIP code. 以州为例-假设您可以在分析销售数据所依据的同一层次结构中具有较低级别的信息,例如县,市,邮政编码。 You also may well have data on the individual customer, the specific order reference, which shop or sales office was involved, which employees were involved in processing the order, etc. So it would be odd to choose to use State as the granularity of a fact table, unless you had some specific reason to aggregate up from a transaction fact table that was based on something like order item. 您还可能拥有有关单个客户,特定订单参考,涉及哪个商店或销售办事处,涉及到哪些员工处理订单的数据,等等。因此,选择使用State作为产品的粒度是很奇怪的。事实表,除非您有某些特定原因要从基于订单项之类的交易事实表中汇总出来。

Where you often see date or time fields as the granularity of a table is in periodic snapshot facts, but again these are generally aggregated up from other, lower-granularity data sources. 在表的粒度中,您经常会看到日期或时间字段位于定期快照事实中,但是通常它们又是从其他较低粒度的数据源中汇总而来的。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM