简体   繁体   English

Spring Batch工作设计-多个读者

[英]Spring Batch Job Design -Multiple Readers

I'm struggling with how to design a Spring Batch job. 我正在努力设计Spring Batch作业。 The overall goal is to retrieve ~20 million records and save them to a sql database. 总体目标是检索约2000万条记录并将其保存到sql数据库中。

I'm doing it in two parts. 我分为两个部分。 First I retrieve the 20 million ids of the records I want to retrieve and save those to a file (or DB). 首先,我检索要检索的记录的2000万个ID,并将它们保存到文件(或数据库)中。 This is a relatively fast operation. 这是一个相对较快的操作。 Second, I loop through my file of Ids, taking batches of 2,000, and retrieve their related records from an external service. 其次,我遍历我的Ids文件,分批处理2,000个,然后从外部服务检索它们的相关记录。 I then repeat this, 2,000 Ids at a time, until I've retrieved all of the records. 然后,我一次重复2000个ID,直到检索到所有记录为止。 For each batch of 2,000 records I retrieve, I save them to a database. 对于我检索的每2000条记录,我将它们保存到数据库中。

Some may be asking why I'm doing this in two steps. 有人可能会问为什么我要分两个步骤进行。 I eventual plan to make the second step run in parallel so that I can retrieve batches of 2,000 records in parallel and hopefully greatly speed-up the download. 我最终计划使第二步并行运行,以便可以并行检索2,000条记录的批次,并希望大大加快下载速度。 Having the Ids allows me to partition the job into batches. 有了ID,我就可以将工作分为几批。 For now, let's not worry about parallelism and just focus on how to design a simpler sequential job. 现在,让我们不必担心并行性,而只关注于如何设计一个更简单的顺序作业。

Imagine I already have solved the first problem of saving all of the Ids locally. 想象一下,我已经解决了将所有ID保存在本地的第一个问题。 They are in a file, one Id per line. 它们在文件中,每行一个ID。 How do I design the steps for the second part? 我如何设计第二部分的步骤?

Here's what I'm thinking… 这就是我的想法

Read 2,000 Ids using a flat file reader. 使用平面文件读取器读取2,000个ID。 I'll need an aggregator since I only want to do one query to my external service for each batch of 2K Ids. 我将需要一个聚合器,因为我只想对每批2K ID进行一次对我的外部服务的查询。 This is where i'm struggling. 这就是我努力的地方。 Do I nest a series of readers? 我会嵌套一系列读者吗? Or can I do 'reading' in the processor or writer? 还是可以在处理器或写入器中进行“读取”?

Essentially, my problem is that I want to read lines from a file, aggregate those lines, and then immediately do another 'read' to retrieve the respective records. 本质上,我的问题是我想从文件中读取行,汇总这些行,然后立即进行另一次“读取”以检索相应的记录。 I almost want to chain readers together. 我几乎想将读者链接在一起。

Finally, once I've retrieved the records from the external service, I'll have a List of records. 最后,一旦我从外部服务中检索了记录,便有了记录列表。 Which means when they arrive at the Writer, I'll have a list of lists. 这意味着当他们到达Writer时,我将获得一个列表列表。 I want a list of objects so that I can use the JdbcItemWriter out of the box. 我想要一个对象列表,以便可以直接使用JdbcItemWriter。

Thoughts? 有什么想法吗? Hopefully that makes sense. 希望这是有道理的。

Andrew 安德鲁

This is a matter of design and is subjective, but based on the Spring Batch example I found (from SpringSource) and my personal experience, the pattern of doing addtional reading in the processor step is a good solution to this problem. 这是一个设计问题,并且是主观的,但是基于我发现的Spring Batch示例(来自SpringSource)和我的个人经验,在处理器步骤中进行附加读取的模式是解决此问题的一个好方法。 You can also chain together multiple processors/readers in the 'processor' step. 您还可以在“处理器”步骤中将多个处理器/阅读器链接在一起。 So, while the names don't exactly match, i find myself doing more and more 'reading' in my processors. 因此,尽管名称不完全匹配,但我发现自己在处理器中做的越来越多。

[http://docs.spring.io/spring-batch/trunk/reference/html/patterns.html#drivingQueryBasedItemReaders][1]

Given that you want to call your external service just once per chunk of 2.000 records, you 'll actually want to do this service call in an ItemWriter . 假设您只需要为每2.000条记录调用一次外部服务,您实际上将想在ItemWriter进行此服务调用。 That is the standard recommended way to do chunk-level processing. 这是进行块级处理的标准推荐方法。

You can create a custom ItemWriter<Long> implementation. 您可以创建自定义ItemWriter<Long>实现。 It will receive the list of 2.000 IDs as input, and call the external service. 它将收到2.000个ID的列表作为输入,并调用外部服务。 The result from the external service should allow you to create a List<Item> . 外部服务的结果应允许您创建List<Item> Your writer can then simply forward this List<Item> to your JdbcItemWriter<Item> delegate. 然后,您的编写者可以简单地将此List<Item>转发到您的JdbcItemWriter<Item>委托。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM