简体   繁体   English

ETL作为交易

[英]ETL as a transaction

For all the ETLs I have written so far, I have never made them transactions - ie if table 4 fails, roll everything back. 到目前为止,对于我已经编写的所有ETL,我都从未进行过事务处理-例如,如果表4失败,请回滚所有内容。

What is the best practice in this regard? 在这方面的最佳实践是什么?

To "BeginTran + Commit" or not to "BeginTran + Commit" 要“ BeginTran +提交”还是不“ BeginTran +提交”

EDIT: I have one master package calling 4 other packages - is it possible to roll them all up into one transaction? 编辑:我有一个主程序包调用其他4个程序包-是否可以将它们全部汇总为一个事务?

In SSIS, I always Begin Trans + Commit . 在SSIS中,我总是Begin Trans + Commit I want to make sure that I can re-run the package without issue (or having to find what rows actually got inserted) if it fails. 我想确保如果失败,我可以重新运行该程序包而不会出现问题(或必须查找实际插入了哪些行)。

It just makes recovery and cleanup so much easier. 它只是使恢复和清理变得如此容易。

begin+commit in manageable batch sizes. 开始+提交可管理的批量大小。 You don't want to wrap a 6 hours import into a single transaction every night. 您不想每晚将6个小时的导入交易打包成一个交易。 Keep your batches at a size that can finish in 2-3 minutes at most. 使批次的大小最多可在2-3分钟内完成。 That you will hit data purity issues that will fail an ETL is a given, so at least reduce the impact to something manageable (ie. don't trigger a rollback that will last another 6 hours to complete). 你会打的将失败的ETL是给定的,因此至少减少的影响,一些管理的数据纯度问题(即不触发回滚,将持续6小时才能完成)。

You are often moving too much data in ETL to use a SQL transaction (the log has to store ALL the data to roll back, remember). 您经常在ETL中移动太多数据而无法使用SQL事务(请记住,日志必须存储所有数据才能回滚)。 I prefer to design packages such that they can be re-run nondestructively. 我更喜欢设计软件包,以便可以无损地重新运行它们。 Ideally they should be set up so that if they die in mid-stream, you can just start them and they'll continue somewhere approximately where they left off. 理想情况下,应该对它们进行设置,以便如果它们在中途死亡,则只需启动它们,它们就会在大约停止的地方继续运行。 Sometimes there's a performance penalty for that, but I think it's worth it. 有时会为此降低性能,但我认为这是值得的。

Technically you can roll packages up into a single transaction; 从技术上讲,您可以将包裹汇总为单个交易; practically, maybe not. 实际上,也许不是。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM