简体繁体 English

具有过滤后的参考表的SSIS查找

[英]SSIS Lookup with filtered reference table

原文 2010-06-09 15:10:56 5 1 ssis/ performance/ filtered-lookup

I am trying to improve the performance of a SSIS Package. 我试图改善SSIS包的性能。

One thing I got startted with is to filter the reference table of the Lookups. 我开始做的一件事是过滤查阅的参考表。 Until now, I was using a table as a reference table for that lookup. 到目前为止，我一直使用表作为该查询的参考表。

First improvment was to change the table to a SQL clause that is selecting just the columns I need from that table. 第一个改进是将表更改为SQL子句，该子句仅从该表中选择我需要的列。

Next, I want to load in this table just the records I know I'll use for sure. 接下来，我只想将肯定会使用的记录加载到该表中。 If I'm maintaining it in this state, I will get to load 300 000 lines or more (huge lines with binary content of around 500 kb each) and use just around 100 of them. 如果将它保持在这种状态，我将加载30万行或更多（巨大的行，每个行的二进制内容约为500 kb），并仅使用其中的100条。

I would put some filters in the SQL query that sets the reference table of the lookup, BUT, in that filter I need to use ALL the ids of the rows loaded in my OLE DB source. 我将在设置查询引用表的SQL查询中放入一些过滤器，但在该过滤器中，我需要使用OLE DB源中加载的行的所有ID。

Is there any way to do this? 有什么办法吗？

I thought of loading each row at a time using a OleDB Command instead of a Lookup, but except of beeing time consuming, I might get to load the same thing 100 times for 100 different rows, when I could just load it once in the lookup and use it 100 times... 我曾想过使用OleDB命令而不是Lookup一次加载每一行，但是除了耗费大量时间之外，我可能会为100个不同的行加载相同的内容100次，而我只能在查找中加载一次并使用100次...

Enableing the cache still would be another option that still doesn't sound very good, because it would slow us down - we are already terribly slow. 启用缓存仍然是另一个听起来还不太好的选择，因为它会使我们慢下来-我们已经非常慢了。

Any ideeas are greatly appreaciated. 任何想法都得到了极大的赞赏。

1 个解决方案

One possibility is to first stream the distinct IDs to a permanent/temporary table in one data flow and then use it in your lookup (with a join) in a later data flow (you probably have to defer validation). 一种可能性是，首先将不同的ID流式传输到一个数据流中的永久/临时表，然后在以后的数据流中将其用于查找（带有连接）（您可能必须推迟验证）。

In many of our ETL packages, we first stream the data into a Raw file, handling all the type conversions and everything on the way there. 在许多ETL软件包中，我们首先将数据流式传输到Raw文件中，处理所有类型转换以及该过程中的所有操作。 Then, when all these conversions were successful, then we handle creating new dimensions and then the facts linking to the dimensions. 然后，当所有这些转换都成功时，我们将处理创建新维度，然后将事实链接到维度的过程。