简体   繁体   English

我应该在每次执行批处理后提交吗?

[英]Should I COMMIT after every execute batch?

I have a 1 trillion records file.我有一个 1 万亿的记录文件。 Batch size is 1000 after which the batch is Executed.批处理大小为 1000,然后执行批处理。

Should I commit after each Batch ?我应该在每批之后提交吗? Or Commit just once after all the 1 trillion records are executed in Batches of 1000 ?或者在所有 1 万亿条记录以 1000 个批次执行后仅提交一次?

{
// Loop for 1 Trillion Records
statement.AddBatch()

      if (++count % 1000 == 0) 
       {
       statement.executeBatch()
       // SHOULD I COMMIT HERE AFTER EACH BATCH ???
       }

} // End Loop
// SHOULD I COMMIT HERE ONCE ONLY ????

A commit marks the end of a successful transaction. 提交标志着成功交易的结束。 So the commit should theoretically happen after all rows have been executed successfully. 因此理论上应该在所有行成功执行之后发生。 If the execution statements are completely independent, than every one should have it's own commit (in theory). 如果执行语句是完全独立的,那么每个执行语句都应该拥有它自己的提交(理论上)。

But there may be limitations by the database system that require to split up the rows in several batches with their own commit. 但是,数据库系统可能存在一些限制,需要使用自己的提交将这些行分成几个批次。 Since a database has to reserve some space to be able to do a rollback unless changes are committed, the "cost" of a huge transaction size may by very high. 由于数据库必须保留一些空间才能进行回滚,除非提交更改,因此巨大事务大小的“成本”可能非常高。

So the answer is: It depends on your requirements, your database and environment. 所以答案是:这取决于您的要求,您的数据库和环境。

Mostly it depends what you want to achieve, usually you need to compromise on something to achieve something.大多数情况下,这取决于您想要实现的目标,通常您需要在某些事情上妥协才能实现某些目标。 For example, I am deleting 3 million records that are no longer being accessed by my users using a stored procedure.例如,我正在删除 300 万条用户不再使用存储过程访问的记录。

If I execute delete query all at once, a table lock gets escalated and my other users start getting timeout issues in our applications because the table has been locked by SQL Server (I know the question is not specific to SQL Server but could help debug the problem) to give the deletion process better performance, If you have such a case, you will never go for a bigger batch than 5000. (See Lock Escalation Threshold )如果我一次执行全部删除查询, table lock会升级,我的其他用户开始在我们的应用程序中遇到超时问题,因为该表已被SQL Server锁定(我知道该问题并非特定于 SQL Server,但可以帮助调试问题)给删除过程更好的性能,如果你有这样的情况,你永远不会选择大于 5000 的批处理。(参见Lock Escalation Threshold

With my current plan, I am deleting 3000 rows per batch and only key lock is happening which is good, I am committing after half a million records are processed.按照我目前的计划,我每批删除 3000 行,并且只发生键锁,这很好,我在处理了 50 万条记录后提交。

So, if you do not want simultaneous users hitting the table, you can delete the huge number of records if your database server has enough log space and processing speed but 1 Trillion records are a mess.因此,如果您不希望同时有用户访问该表,如果您的数据库服务器有足够的日志空间和处理速度,但1 万亿条记录是一团糟,您可以删除大量记录。 You better proceed with a batch wise deletion or if 1 Trillion records are total records in the table and you want to delete all of those records, then I'd suggest go for a truncate table.您最好进行批量删除,或者如果1 万亿条记录是表中的总记录并且您想删除所有这些记录,那么我建议去truncate表。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM