简体繁体 English

加载事实表+查找/ UnionAll用于SK查找

[英]Loading Fact Table + Lookup / UnionAll for SK lookups

原文 2010-04-29 04:59:19 5 2 sql-server/ ssis/ lookup/ surrogate-key/ fact-table

I got to populate FactTable with 12 lookups to dimension table to get SK's, of which 6 are to different Dim Tables and rest 6 are lookup to same DimTable (type II) doing lookup to same natural key. 我必须在FactTable中填充12个对维表的查找以获取SK，其中6个对不同的Dim表进行查找，其余6个对相同DimTable（类型II）的查找对相同的自然键进行查找。

Ex: 例如：

PrimeObjectID => lookup to DimObject.ObjectID => get ObjectSK PrimeObjectID =>查找DimObject.ObjectID =>获取ObjectSK

and got other columns which does same 并获得其他相同的列

OtherObjectID1 => lookup to DimObject.ObjectID => get ObjectSK OtherObjectID1 =>查找DimObject.ObjectID =>获取ObjectSK

OtherObjectID2 => lookup to DimObject.ObjectID => get ObjectSK OtherObjectID2 =>查找DimObject.ObjectID =>获取ObjectSK

OtherObjectID3 => lookup to DimObject.ObjectID => get ObjectSK OtherObjectID3 =>查找DimObject.ObjectID =>获取ObjectSK

OtherObjectID4 => lookup to DimObject.ObjectID => get ObjectSK OtherObjectID4 =>查找DimObject.ObjectID =>获取ObjectSK

OtherObjectID5 => lookup to DimObject.ObjectID => get ObjectSK OtherObjectID5 =>查找DimObject.ObjectID =>获取ObjectSK

for such multiple lookup how should go in my SSIS package. 对于这样的多重查找，应该如何将其放入我的SSIS包中。

for now am using lookup / unionall foreach lookup. 目前正在使用lookup / unionall foreach查找。 Is there a better way to this. 有没有更好的办法了。

2 个解决方案

I assume what you are doing is a lookup, with errors redirected to a derived column to set default values for failed lookups, followed by a union all for each of the lookup/derived column values. 我假设您正在执行的是查找，将错误重定向到派生列以为失败的查找设置默认值，然后对每个查找/派生的列值进行并集。 That pattern is fairly common and I use it in early stages to help debug. 这种模式相当普遍，我在早期使用它来帮助调试。 However, since a union all is a partially blocking component (ie the Union All creates a new buffer when it executes, but then passes data through as soon as it comes in) in SSIS this will decrease the overall efficiency of your package due to the overhead of creating new buffers in your data flow. 但是，由于Union All是部分受阻止的组件（即Union All在执行时会创建一个新的缓冲区，但是一旦它进入就立即传递数据），这会降低软件包的整体效率，因为在数据流中创建新缓冲区的开销。 Usually, I will code the series of lookups to ignore errors and then after the last one, I will include a derived column component that does a replace with the default for all of the columns that are included as targets of lookups. 通常，我将对一系列查找进行编码以忽略错误，然后在最后一个查找之后，我将包括一个派生的列组件，该组件用作为查找目标的所有列的默认值进行替换。 This allows for the most efficient flow of data through your dataflow. 这样可以通过您的数据流实现最高效的数据流。 For more information on which data flow components are blocking or semi-blocking see this post: http://sqlblog.com/blogs/jorg_klein/archive/2008/02/12/ssis-lookup-transformation-is-case-sensitive.aspx 有关哪些数据流组件正在阻塞或半阻塞的更多信息，请参见以下文章： http : //sqlblog.com/blogs/jorg_klein/archive/2008/02/12/ssis-lookup-transformation-is-case-sensitive。 aspx

I don't understand why you are doing 2 lookups per dimension. 我不明白为什么您要对每个维度进行2次查找。

Typically we processed all the dimensions first (using the TableDifference component to infer/expire the dimensions). 通常，我们首先处理所有尺寸（使用TableDifference组件推断/失效尺寸）。

Then the fact table was loaded, doing one lookup on each dimension (in series) using the business keys to find the surrogate keys. 然后装入事实表，使用业务密钥在每个维度（系列）上进行一次查找，以找到代理密钥。