简体   繁体   English

Spring Batch工作设计-多个读者

[英]Spring Batch Job Design -Multiple Readers

I'm struggling with how to design a Spring Batch job. 我正在努力设计Spring Batch作业。 The overall goal is to retrieve ~20 million records and save them to a sql database. 总体目标是检索约2000万条记录并将其保存到sql数据库中。

I'm doing it in two parts. 我分为两个部分。 First I retrieve the 20 million ids of the records I want to retrieve and save those to a file (or DB). 首先,我检索要检索的记录的2000万个ID,并将它们保存到文件(或数据库)中。 This is a relatively fast operation. 这是一个相对较快的操作。 Second, I loop through my file of Ids, taking batches of 2,000, and retrieve their related records from an external service. 其次,我遍历我的Ids文件,分批处理2,000个,然后从外部服务检索它们的相关记录。 I then repeat this, 2,000 Ids at a time, until I've retrieved all of the records. 然后,我一次重复2000个ID,直到检索到所有记录为止。 For each batch of 2,000 records I retrieve, I save them to a database. 对于我检索的每2000条记录,我将它们保存到数据库中。

Some may be asking why I'm doing this in two steps. 有人可能会问为什么我要分两个步骤进行。 I eventual plan to make the second step run in parallel so that I can retrieve batches of 2,000 records in parallel and hopefully greatly speed-up the download. 我最终计划使第二步并行运行,以便可以并行检索2,000条记录的批次,并希望大大加快下载速度。 Having the Ids allows me to partition the job into batches. 有了ID,我就可以将工作分为几批。 For now, let's not worry about parallelism and just focus on how to design a simpler sequential job. 现在,让我们不必担心并行性,而只关注于如何设计一个更简单的顺序作业。

Imagine I already have solved the first problem of saving all of the Ids locally. 想象一下,我已经解决了将所有ID保存在本地的第一个问题。 They are in a file, one Id per line. 它们在文件中,每行一个ID。 How do I design the steps for the second part? 我如何设计第二部分的步骤?

Here's what I'm thinking… 这就是我的想法

Read 2,000 Ids using a flat file reader. 使用平面文件读取器读取2,000个ID。 I'll need an aggregator since I only want to do one query to my external service for each batch of 2K Ids. 我将需要一个聚合器,因为我只想对每批2K ID进行一次对我的外部服务的查询。 This is where i'm struggling. 这就是我努力的地方。 Do I nest a series of readers? 我会嵌套一系列读者吗? Or can I do 'reading' in the processor or writer? 还是可以在处理器或写入器中进行“读取”?

Essentially, my problem is that I want to read lines from a file, aggregate those lines, and then immediately do another 'read' to retrieve the respective records. 本质上,我的问题是我想从文件中读取行,汇总这些行,然后立即进行另一次“读取”以检索相应的记录。 I almost want to chain readers together. 我几乎想将读者链接在一起。

Finally, once I've retrieved the records from the external service, I'll have a List of records. 最后,一旦我从外部服务中检索了记录,便有了记录列表。 Which means when they arrive at the Writer, I'll have a list of lists. 这意味着当他们到达Writer时,我将获得一个列表列表。 I want a list of objects so that I can use the JdbcItemWriter out of the box. 我想要一个对象列表,以便可以直接使用JdbcItemWriter。

Thoughts? 有什么想法吗? Hopefully that makes sense. 希望这是有道理的。

Andrew 安德鲁

This is a matter of design and is subjective, but based on the Spring Batch example I found (from SpringSource) and my personal experience, the pattern of doing addtional reading in the processor step is a good solution to this problem. 这是一个设计问题,并且是主观的,但是基于我发现的Spring Batch示例(来自SpringSource)和我的个人经验,在处理器步骤中进行附加读取的模式是解决此问题的一个好方法。 You can also chain together multiple processors/readers in the 'processor' step. 您还可以在“处理器”步骤中将多个处理器/阅读器链接在一起。 So, while the names don't exactly match, i find myself doing more and more 'reading' in my processors. 因此,尽管名称不完全匹配,但我发现自己在处理器中做的越来越多。


Given that you want to call your external service just once per chunk of 2.000 records, you 'll actually want to do this service call in an ItemWriter . 假设您只需要为每2.000条记录调用一次外部服务,您实际上将想在ItemWriter进行此服务调用。 That is the standard recommended way to do chunk-level processing. 这是进行块级处理的标准推荐方法。

You can create a custom ItemWriter<Long> implementation. 您可以创建自定义ItemWriter<Long>实现。 It will receive the list of 2.000 IDs as input, and call the external service. 它将收到2.000个ID的列表作为输入,并调用外部服务。 The result from the external service should allow you to create a List<Item> . 外部服务的结果应允许您创建List<Item> Your writer can then simply forward this List<Item> to your JdbcItemWriter<Item> delegate. 然后,您的编写者可以简单地将此List<Item>转发到您的JdbcItemWriter<Item>委托。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

粤ICP备18138465号  © 2020-2024 STACKOOM.COM