简体   繁体   中英

Informatica 9.5.1, huge table (scd1)

I have a table(in oracle) size about 860 million records (850gb) on top we are getting about 2 -3 million records as source (flatfile). we are doing a lookup on target if record already exist it will update if it is a new record it will insert(scd1). The transformations we using are unconnectedlookup, sorter, filter and router, update strategy transformations, it was fine all this time, but as the table is huge and growing huge, it is taking for ever to insert and update, last night it took 19 hrs to 2.4 million records (2.1 millions were new so inserted and the rest are updates). Today I got about 1.9 millions to go through i am not sure how long it will take any suggestions or help how can we handle this ?

1) Use just a connected lookup to oracle table, after SQ matching on primary key and filter out nulls (records missing in Oracle table) or not null (updates). Dont check for other columns for update. Skip sorter and filter. Just use update strategy.

2) Or use joiner and make flat file pipeline as master. Then check for nulls to find insert or updates .

3) Check if your target table dont have any trigger etc on it. If yes then check its logic and implement it in ETL.

How many inserts vs updates do you have?

  • With just a few updates, try using Update else Insert target property.
  • If there are many updates and few inserts, perform update just if a key is found, without checking if anything has changed
  • If there are many source rows matching what you already have (ie an update that doesn't change anything) try to eliminate them. But don't compare all columns - use a hash instead. Just create an additional computed column that will contain a MD5 calculated on all columns. Then all you need to do is compare one column instead of all to detect a change.

Since you are dealing with 850mil data, you have two major bottlenecks - target lookup and writing into target. You can think of this strategy -

  • Mapping 1 - Create a new mapping to load flat file data into a temp table TMP1.
  • Mapping 2 - Modify existing mapping. Just modify lookup query and join TMP1 and target (860mil)table in SQL Override. This will reduce time, I/O, lookup cache. Also, please make sure you have an index on key columns in target. And you drop-create all other index while loading. Skipping sorter will help but adding joiner will not help much.

Regards, Koushik

1) Try using a merge statement if source and targets are in same database.

2) We can also use sql loader connection to improve the performance.

Clearly the bottleneck is in the target lookup and target load (update to be specific).

Try the following to tune the existing code:

1) Try to remove any unwanted lookup ports if you have in the lookup transformation. Keep only the fields that are used in the lookup condition as you are using it just to check if the record exists.

2) Try adding an index to the target table for the fields you are using for the update

3) Increase the commit interval of the session to a higher value.

4) Partial Pushdown optimization:

You can pushdown some of the processing to database which might be faster instead of doing it in Informatica

  • Create a staging table to hold the incoming data for that load.

  • Create a mapping to load the incoming file to the staging table. Truncate it before the start of the load to clear the records of the previous run.

  • In the SQL override of the existing mapping do a left join between the staging table and target table to find insert/updates. This will be faster than the Informatica lookup and eliminates the time taken to build the Informatica lookup cache.

5) Using MD5 to eliminate unwanted updates

  • For using MD5 you need to add a new field in the target table and do a mapping to update the existing records one time.

  • Then in your existing mapping add a step to compute MD5 for the incoming column.

  • If the record is identified for update then check if the MD5 computed for the incoming column is same as that of the target column. If the checksum also matches then don't update the record. Only if the check sum is different update the record. By this way you will filter out the unwanted updates. If there is no lookup match then insert the record.

    Advantages: You are reducing the unwanted updates.

    Disadvantages: You have to do an one time process to populate MD5 values for the existing records in the table.

If none of this works check with your database administrator to see if there is any issue in the database side that might slow down the load.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM