[英]SQL/SSIS DataWareHouse Fact table loading, best practices?
I am building my first datawarehouse in SQL 2008/SSIS and I am looking for some best practices around loading the fact tables. 我正在用SQL 2008 / SSIS构建我的第一个数据仓库,我正在寻找一些有关加载事实表的最佳实践。
Currently in my DW I have about 20 Dimensions (Offices, Employees, Products, Customer, etc.) that are of Type 1 SCD. 当前在我的DW中,我有大约20个类型1 SCD的维度(办公室,员工,产品,客户等)。 In my dw structure, there are a few things I have already applied: 在我的dw结构中,我已经应用了一些东西:
In my Fact loading SSIS project, the current method I have for loading dimensions is having multiple lookups (20+) to each of the DIMs, then populating the FACT table with the data. 在我的Fact加载SSIS项目中,当前用于加载尺寸的方法是对每个DIM进行多次查找(20+),然后用数据填充FACT表。
For my lookups I set: 对于我的查找,我设置了:
Is this the best approach? 这是最好的方法吗? Pictures attached to help with my description above. 附上图片以帮助我进行上述描述。
Looks fine. 看起来不错 There are options if you start to run into performance issues, but if this is stable (finishes within data-loading time window, source systems aren't being drained of resources, etc), then I see no reason to change. 如果您开始遇到性能问题,则有一些选择,但是如果这是稳定的(在数据加载时间窗口内完成,并且源系统没有耗尽资源等),那么我认为没有理由进行更改。
Some potential issues to keep an eye on... 需要注意的一些潜在问题...
A common alternative (to what you have above) is to extract the fact table data from the source system and land it in a staging area before doing the dimension key lookups via a single SQL statement. 一种常见的替代方法(相对于上面的方法)是从源系统中提取事实表数据,并将其放在登台区域中,然后通过单个SQL语句进行维度键查找。 Some even keep a set of dimension key mapping tables in the staging area specifically for this purpose. 有些甚至为此专门在登台区域中保留一组维度键映射表。 This reduces locking/blocking on the source system...if you have a lot of data each load, and have to block the source system while you suck the data out and run it through those 20+ lookup transforms. 这样可以减少源系统上的锁定/阻塞...如果每次加载时都有大量数据,并且必须在吸收数据并通过那20多个查找转换运行数据时阻塞源系统。
Having a good staging area strategy becomes more important when you have a large amount of data, large dimensions, complex key mappings (usually due to multiple source systems), and short data-loading time windows. 当您拥有大量数据,大尺寸,复杂的键映射(通常是由于多个源系统)以及较短的数据加载时间窗口时,拥有良好的暂存区策略就变得尤为重要。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.