简体   繁体   English

Spring Batch如何管理事务(可能有多个数据源)?

[英]How does Spring Batch manage transactions (with possibly multiple datasources)?

I would like some information about the data flow in a Spring Batch processing but fail to find what I am looking for on the Internet (despite some useful questions on this site). 我想了解有关Spring批处理中数据流的一些信息,但未能在Internet上找到我要查找的内容(尽管本网站上有一些有用的问题 )。

I am trying to establish standards to use Spring Batch in our company and we are wondering how Spring Batch behaves when several processors in a step updates data on different data sources. 我正在尝试建立在我们公司使用Spring Batch的标准,我们想知道当一个步骤中的多个处理器更新不同数据源上的数据时,Spring Batch的行为如何。

This question focuses on a chunked process but feel free to provide information on other modes. 这个问题集中在一个分块的过程,但随时提供有关其他模式的信息。

From what I have seen (please correct me if I am wrong), when a line is read, it follows the whole flow (reader, processors, writer) before the next is read (as opposed to a silo-processing where reader would process all lines, send them to the processor, and so on). 从我所看到的(如果我错了请纠正我),当读取一行时,它会在读取下一行之前跟随整个流程(读取器,处理器,写入器)(而不是读取器处理的孤岛处理)所有行,将它们发送到处理器,等等)。

In my case, several processors read data (in different databases) and updates them in the process, and finally the writer inserts data into yet another DB. 在我的例子中,几个处理器读取数据(在不同的数据库中)并在此过程中更新它们,最后写入器将数据插入到另一个DB中。 For now, the JobRepository is not linked to a database, but that would be an independent one, making the thing still a bit more complex. 目前,JobRepository没有链接到数据库,但这将是一个独立的,使得事情仍然有点复杂。

This model cannot be changed since the data belongs to several business areas. 由于数据属于多个业务领域, 因此无法更改此模型

How is the transaction managed in this case? 在这种情况下如何管理交易? Is the data committed only once the full chunk is processed? 只有处理完整块后才提交数据吗? And then, is there a 2-phase commit management? 那么,是否存在两阶段提交管理? How is it ensured? 如何确保? What development or configuration should be made in order to ensure the consistency of data? 应该进行哪些开发或配置以确保数据的一致性?

More generally, what would your recommendations be in a similar case? 更一般地说, 您的建议在类似情况下会是什么?

Spring batch uses the Spring core transaction management , with most of the transaction semantics arranged around a chunk of items, as described in section 5.1 of the Spring Batch docs . Spring批处理使用Spring核心事务管理 ,大多数事务语义排列在一大块项目周围,如Spring Batch文档的第5.1节所述

The transaction behaviour of the readers and writers depends on exactly what they are (eg file system, database, JMS queue etc), but if the resource is configured to support transactions then they will be enlisted by spring automatically. 读者和作者的交易行为取决于他们究竟是什么(例如文件系统,数据库,JMS队列等),但如果资源配置为支持事务,那么他们将被spring自动登记。 Same goes for XA - if you make the resource endpoint a XA compliant then it will utilise 2 phase commits for it. XA也是如此 - 如果您使资源端点符合XA标准,那么它将使用2阶段提交。

Getting back to the chunk transaction, it will set up a transaction on chunk basis, so if you set the commit interval to 5 on a given tasklet then it will open and close a new transaction (that includes all resources managed by the transaction manager) for the set number of reads (defined as commit-interval). 回到块事务,它将以块为基础设置事务,因此如果在给定的tasklet上将提交间隔设置为5,则它将打开并关闭新事务(包括事务管理器管理的所有资源)对于设定的读取次数(定义为commit-interval)。

But all of this is set up around reading from a single data source, does that meet your requirement? 但所有这些都是围绕从单一数据源读取而设置的,这是否符合您的要求? I'm not sure spring batch can manage a transaction where it reads data from multiple sources and writes the processor result into another database within a single transaction. 我不确定spring批处理可以管理一个事务,它从多个源读取数据并将处理器结果写入单个事务中的另一个数据库。 (In fact I can't think of anything that could do that...) (事实上​​,我无法想到任何能做到这一点......)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM