简体   繁体   English

处理表更新的最佳方法

[英]Best way to handle updates on a table

I am looking for much more better way to update tables using SSIS. 我正在寻找使用SSIS更新表的更好的方法。 Specifically, i wanted to optimize the updates on tables (around 10 tables uses same logic). 具体来说,我想优化表的更新(大约10个表使用相同的逻辑)。

The logic is, 逻辑是

  1. Select the source data from staging then inserts into physical temp table in the DW (ie TMP_Tbl) 从暂存中选择源数据,然后将其插入DW中的物理临时表(即TMP_Tbl)
  2. Update all data matching by customerId column from TMP_Tbl to MyTbl. 将所有由customerId列匹配的数据从TMP_Tbl更新为MyTbl。
  3. Inserts all non-existing customerId column from TMP_Tbl1 to MyTbl. 将所有不存在的customerId列从TMP_Tbl1插入MyTbl。

Using the above steps, this takes some time populating TMP_Tbl. 使用上述步骤,这需要一些时间来填充TMP_Tbl。 Hence, i planned to change the logic to delete-insert but according to this: In SQL, is UPDATE always faster than DELETE+INSERT? 因此,我计划将逻辑更改为delete-insert,但要注意以下几点在SQL中,UPDATE是否总是比DELETE + INSERT快? this would be a recipe for pain. 这将是痛苦的秘方。

Given: 鉴于:

  • no index/keys used on the tables 表上未使用索引/键
  • some tables contains 5M rows, some contains 2k rows 有些表包含500万行,有些表包含2k行
  • each table update took up to 2-3 minutes, which took for about (15 to 20 minutes) all in all 每个表更新最多需要2-3分钟,总共大约需要15至20分钟
  • these updates we're in separate sequence container simultaneously runs 这些更新我们在单独的序列容器中同时运行

Anyone knows what's the best way to use, seems like using physical temp table needs to be remove, is this normal? 任何人都知道最好的使用方法是什么,似乎需要删除使用物理临时表,这正常吗?

With SSIS you usually BULK INSERT , not INSERT . 使用SSIS,您通常会BULK INSERT ,而不是INSERT So if you do not mind DELETE - reinserting the rows should in general outperform UPDATE . 因此,如果您不介意DELETE ,则重新插入行的性能通常应优于UPDATE

Considering this the faster approach will be: 考虑到这一点,更快的方法将是:

  1. [Execute SQL Task] Delete all records which you need to update. [Execute SQL Task]删除所有需要更新的记录。 (Depending on your DB design and queries, some index may help here). (根据您的数据库设计和查询,某些索引可能会有所帮助)。

  2. [Data Flow Task] Fast load (using OLE DB Destination, Data access mode: Table of fiew - fast load) both updated and new records from source into MyTbl. [Data Flow Task]快速加载(使用OLE DB目标,数据访问模式:数据表-快速加载),将更新的记录和新记录都从源导入MyTbl。 No need for temp tables here. 这里不需要临时表。

If you cannot/don't want to DELETE records - your current approach is OK too. 如果您不能/不想DELETE记录,那么您当前的方法也可以。 You just need to fix the performance of that UPDATE query (adding an index should help). 您只需要修复该UPDATE查询的性能即可(添加索引应该会有所帮助)。 2-3 minutes per every record updated is way too long. 每条更新的记录2-3分钟太长了。 If it is 2-3 minutes for updating millions of records though - then it's acceptable. 但是,如果要花2-3分钟来更新数百万条记录,则可以接受。

Adding the correct non-clustered index to a table should not result in "much more time on the updates". 向表中添加正确的非聚集索引不应导致“更新时间更多”。 There will be a slight overhead, but if it helps your UPDATE to seek instead of scanning a big table - it is usually well worth it. 会有一些开销,但是如果它可以帮助您的UPDATE而不是扫描大表,那么它通常是值得的。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM