简体   繁体   English

分布式事务 - 为什么我们将tranlogs保存到文件系统?

[英]Distributed transactions - why do we save tranlogs to file system?

All transaction managers (Atomikos, Bitronix, IBM WebSphere TM etc) save some "transaction logs" into 'tranlogs' folder to file system. 所有事务管理器(Atomikos,Bitronix,IBM WebSphere TM等)都将一些“事务日志”保存到文件系统的“tranlogs”文件夹中。

When something terrible happens and server gets down sometimes tranlogs become broken. 当一些可怕的事情发生并且服务器崩溃时,有时变更会被破坏。 They require some manual recovery procedure. 它们需要一些手动恢复程序。

I've been told that by simply clearing broken tranlogs folder I risk to have an inconsistent state of resources that participated in transactions. 我被告知,通过简单地清除已损坏的tranlogs文件夹,我冒着参与交易的资源状态不一致的风险。

As a "dumb" developer I feel more comfortable with simple concepts. 作为一个“愚蠢”的开发者,我对简单的概念感到更舒服。 I want to think that distributed transaction management should be alike the regular transaction management: 想认为分布式事务管理应该与常规事务管理相似:

  1. If something went wrong at any party (network, app error, timeout) - I expect the whole multi-resource transaction not to be committed in any part of it. 如果在任何一方出现问题(网络,应用程序错误,超时) - 我希望整个多资源事务不会在其任何部分提交。 All leftovers should be cleaned up sooner or later automatically. 应尽快清理所有剩菜。
  2. If transaction managers fails (file system fault, power supply fault) - I expect all the transactions under this TM to be rollbacked (apparently, at DB timeout level). 如果事务管理器出现故障(文件系统故障,电源故障) - 我预计此TM下的所有事务都将被回滚(显然,在数据库超时级别)。
  3. File storage for tranlogs is optional if I don't want to have any automatic TX recovery (whatever it would mean). 如果我不想进行任何自动TX恢复(无论它意味着什么),则tranlog的文件存储是可选的。

Questions 问题

Why can't I think like this? 为什么我不这样想? What's so complicated about 2PC? 2PC有什么这么复杂的?

What are the exact risks when I clear broken tranlogs? 当我清除损坏的tranlogs时,确切的风险是什么?

If I am wrong and I really need all the mess with 2PC file system state. 如果我错了,我真的需要所有混乱的2PC文件系统状态。 Don't you feel sick about the fact that TX manager can actually break storage state in an easy and ugly manner? TX经理是否能够以简单而丑陋的方式实际破坏存储状态这一事实,您是否感到恶心?

When I was first confronted with 2 phase commit in real life in 1994 (initially on a larger Oracle7 environment), I had a similar initial reaction. 当我在1994年第一次面对现实生活中的2阶段提交时(最初是在更大的Oracle7环境中),我有类似的初步反应。 What a bloody shame that it is not generally possible to make it simple. 通常不太可能使它变得简单,这真是一种血腥的耻辱。 But looking back at algorithm books of university, it become clear that there is no general solution for 2PC. 但回顾大学的算法书籍,很明显2PC没有通用的解决方案。

See for instance how to come to consensus in a distributed environment 例如,请参阅如何在分布式环境中达成共识

Of course, there are many specific cases where a 2PC commit of a transaction can be resolved more easy to either complete or roll back completely and with less impact. 当然,有许多特定情况可以解决事务的2PC提交更容易完成或完全回滚并且影响较小。 But the general problem stays and can not be solved. 但是一般问题仍然存在,无法解决。

In this case, a transaction manager has to decide at some time what to do; 在这种情况下,交易经理必须在某个时间决定做什么; a transaction can not remain open forever. 交易不能永远保持开放。 Therefor, as an ultimate solution they will always need to have go back to their own transaction logs, since one or more of the other parties may not be able to reliably communicate status now and in the near future. 因此,作为最终解决方案,他们将始终需要返回到他们自己的交易日志,因为一个或多个其他方可能无法在不久的将来可靠地传达状态。 Some transaction managers might be more advanced and know how to resolve some cases more easily, but the need for an ultimate fallback stays. 一些交易管理人员可能更高级,并且知道如何更轻松地解决某些案例,但仍需要最终的后备支持。

I am sorry for you. 我很抱歉。 Fixing it generally seems to be identical to "Falsity implies anything" in binary logic. 修复它通常似乎与二进制逻辑中的“Falsity暗示任何东西”相同。

Summarizing 总结

On Why can't I think like this? Why can't I think like this? and What's so complicated about 2PC : See above. What's so complicated about 2PC :见上文。 This algorithmetic problem can't be solved universally. 这个算法问题无法普遍解决。

On What are the exact risks when I clear broken tranlogs? What are the exact risks when I clear broken tranlogs? : the transaction manager has some database backing it. :事务管理器有一些数据库支持它。 Deleting translogs is the same problem in general relational database software; 删除translogs是一般关系数据库软件中的相同问题; you loose information on the transactions in process. 你发现正在进行的交易的信息。 Some db platforms can still have somewhat or largely integer files. 某些数据库平台仍然可以包含某些或大部分整数文件。 For background and some database theory, see Wikipedia . 有关背景和一些数据库理论, 请参阅Wikipedia

On Don't you feel sick about the fact that TX manager can actually break storage state in an easy and ugly manner? On Don't you feel sick about the fact that TX manager can actually break storage state in an easy and ugly manner? : yes, sometimes when I have to get a lot of work done by the team, I really hate it. :是的,有时当我必须完成团队的大量工作时,我真的很讨厌它。 But well, it keeps me having a job :-) 但是,它让我有一份工作:-)

Addition: to 2PC or not 增加:2PC或不

From your addition I understand that you are thinking whether or not to include 2PC in your projects. 从您的添加中我了解到您在考虑是否在项目中包含2PC。

In my opinion, your mileage may vary. 在我看来,您的里程可能会有所不同。 Our company has as policy for 2PC: avoid it whenever possible. 我们公司有2PC的政策:尽可能避免它。 However, in some environments and especially with legacy systems and complex environments such a found in banking you can not work around it. 但是,在某些环境中,尤其是遗留系统和复杂环境(例如银行业中发现的环境),您无法解决这些问题。 The customer requires it and they may be not willing to allow you to perform a major change in other infrastructural components. 客户需要它,他们可能不愿意允许您对其他基础设施组件进行重大更改。

When you must do 2PC: do it well. 当你必须做2PC:做得好。 I like a clean architecture of the software and infrastructure, and something that is so simple that even 5 years from now it is clear how it works. 我喜欢软件和基础设施的简洁架构,而且非常简单,即使是5年后它也很清楚它是如何工作的。

For all other cases, we stay away from two phase commit. 对于所有其他情况,我们远离两阶段提交。 We have our own framework (Invantive Producer) from client, to application server to database backend. 我们有自己的框架(Invantive Producer),从客户端到应用服务器到数据库后端。 In this framework we have chosen to sacrifice elements of ACID when normally working in a distributed environment. 在这个框架中,我们选择在正常工作在分布式环境中时牺牲ACID的元素。 The application developer must take care himself of for instance atomicity. 应用程序开发人员必须注意自己的原子性。 Often that is possible with little effort or even doesn't require thinking about. 通常这很简单,甚至不需要考虑。 For instance, all software must be safe for restart. 例如,所有软件必须安全重启。 Even with atomicity of transactions this requires some thinking to do it well in a massive multi user environment (for instance locking issues). 即使有交易的原子性,这也需要一些思考才能在庞大的多用户环境中做得很好(例如锁定问题)。

In general this stupid approach is very easy to understand and maintain. 一般来说,这种愚蠢的方法很容易理解和维护。 In cases where we have been required to do two phase commit, we have been able to just replace some plug-ins on the framework and make some changes to client-side code. 在我们被要求进行两阶段提交的情况下,我们已经能够替换框架上的一些插件并对客户端代码进行一些更改。

So my advice would be: 所以我的建议是:

  • Try to avoid 2PC. 尽量避免使用2PC。
  • But encapsulate your transaction logic nicely. 但很好地封装了你的事务逻辑。
  • Allowing to do 2PC without a complete rebuild, but only changing things where needed. 允许在没有完全重建的情况下执行2PC,但只在需要的地方更改内容。

I hope this helps you. 我希望这可以帮助你。 If you can tell me more about your typical environments (size in #tables, size in GB persistent data, size in #concurrent users, typical transaction mgmt software and platform) may be i can make some additions or improvements. 如果你能告诉我更多关于你的典型环境(#tables的大小,GB持久数据的大小,#concurrent用户的大小,典型的事务管理软件和平台),我可以做一些补充或改进。

Addition: Email and avoiding message loss in 2PC 增加:2PC中的电子邮件和避免消息丢失

Regarding whether suggesting DB combining with JMS: No, combining DB with JMS is normally of little use; 关于是否建议DB与JMS结合:不,将DB与JMS结合起来通常没有什么用处; it will itself already have some db, therefor the original question on transaction logs. 它本身已经有一些数据库,因此有关事务日志的原始问题。

Regarding your business case: I understand that per event an email is sent from a template and that the outgoing mail is registered as an event in the database. 关于您的业务案例:我了解每个事件都会从模板发送电子邮件,并且外发邮件在数据库中注册为事件。

This is a hard nut to crack; 这是一个难以破解的难题; I've been enjoying doing security audits and one of the easiest security issues to score was checking use of email. 我一直很享受安全审计,最简单的安全问题之一就是检查电子邮件的使用情况。

Email - besides not being confidential and tampersafe in most situations like a postcard - has no guarantees for delivery and/or reading without additional measures. 电子邮件 - 除了在明信片等大多数情况下不保密和防篡改之外 - 无法保证无需额外措施即可交付和/或阅读。 For instance, even when email is delivered directly between your mail transfer agent and the recipient, data loss can occur without the transaction monitor being informed. 例如,即使在您的邮件传输代理和收件人之间直接发送电子邮件,也可能会在没有通知事务监视器的情况下丢失数据。 That even gets worse when multiple hops are involved. 当涉及多个跃点时,甚至会变得更糟。 For instance, each MTA has it's own queueing mechanism on which a "bomb can be dropped" leading to data loss. 例如,每个MTA都有自己的排队机制,“炸弹可以丢弃”,导致数据丢失。 But you can also think of spam measures, bad configuration, mail loops, pressing delete file by accident, etc. Even when you can register the sending of the email without any loss of transaction information using 2PC, this gives absolutely no clue on whether the email will arrive at all or even make it across the first hop. 但是你也可以想到垃圾邮件措施,错误的配置,邮件循环,意外删除文件等等。即使你可以使用2PC注册发送电子邮件而不丢失任何交易信息,这也绝对不知道是否电子邮件将全部到达,甚至可以跨越第一跳。

The company I work for sells a large software package for project-driven businesses. 我工作的公司为项目驱动的业务销售大型软件包。 This package has an integrated queueing mechanism, which also handles email events. 该软件包具有集成的排队机制,该机制还处理电子邮件事件。 Typically combined in most implementation with Exchange nowadays. 现在通常与Exchange的大多数实现相结合。 A few months we've had a nice problem: transaction started, opened mail channel, mail delivered to Exchange as MTA, register that mail was handled... transaction aborted, since Oracle tablespace full. 几个月我们遇到了一个很好的问题:交易开始,打开邮件渠道,邮件作为MTA发送到Exchange,注册邮件已处理...事务中止,因为Oracle表空间已满。 On the next run, the mail was delivered again to Exchange, again abort, etc. The algorithm has been enhanced now, but from this simple example you can see that you need all endpoints to cooperate in your 2PC, even when some of the endpoints are far away in an organisation receiving and displaying your email. 在下一次运行中,邮件再次传递到Exchange,再次中止,等等。此算法现在已得到增强,但从这个简单示例中您可以看到您需要所有端点在您的2PC中进行协作,即使某些端点也是如此在收到并显示您的电子邮件的组织中很远。

If you need measures to ensure that an email is delivered or read, you will need to supplement it by additional measures. 如果您需要采取措施确保发送或阅读电子邮件,则需要通过其他措施对其进行补充。 Please pick one of application controls, user controls and process controls from literature. 请从文献中选择一个应用程序控件,用户控件和过程控件。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM