如何在Spring Batch中读取多个CSV文件以合并数据进行处理？

Question

I'm new to Spring Batch and trying to get some guidance for below requirement. 我是Spring Batch的新手，正在尝试获取有关以下要求的指南。

Overall Requirement: 总体要求：

I've to get data from different systems, apply some business logic, save the result in DB. 我必须从不同的系统中获取数据，应用一些业务逻辑，并将结果保存在DB中。

Below is an example. 下面是一个例子。

I need to read data from 3 CSV files. 我需要从3个CSV文件中读取数据。 First file – person.csv – contains name and id Second File – address.csv – contains address info for each person. 第一个文件– person.csv –包含名称和ID。第二个文件– address.csv –包含每个人的地址信息。 One person can have zero or multiple addresses. 一个人可以有零个或多个地址。
Third File – employment.csv – contains employment info for each person. 第三个文件– Employment.csv –包含每个人的就业信息。 One person can have zero or multiple employers. 一个人可以有零个或多个雇主。

Here is some sample. 这是一些示例。

Person.csv### (total size is 8 millions) Person.csv ###（总大小为800万）

"personID", "personName" “ personID”，“ personName”

1, Joey 1，乔伊

2, Chandler 2，钱德勒

3, Ross 3，罗斯

4, Monica 4，莫妮卡

Address.csv 地址.csv

"personID", "addressType", "state" “ personID”，“ addressType”，“ state”

1, residence, NY 纽约市1号住宅

1, mailing, NC 1，邮寄，数控

2, residence, NY 纽约市2号住宅

4, residence, NY 纽约市4号住宅

4, mailing, DC 4，邮寄，DC

Employment.csv 职业.csv

"personID", "employerName" “ personID”，“ employerName”

1, emp1 1，emp1

2, emp2 2，emp2

2, emp3 2，emp3

3, emp4 3，emp4

Note: each file is sorted by person id. 注意：每个文件均按人员ID排序。

To apply the business logic, I need to merge the data for each person, ie, I need to merge person, address, employment data for one person to apply the logic. 要应用业务逻辑，我需要合并每个人的数据，即，我需要合并一个人的人，地址，就业数据以应用逻辑。 Can you suggest any approach for this. 您能为此建议任何方法吗？

Answer 1

It sounds like a 4 step , job. 这听起来像一个4 步，工作。 You'll have to decide where the intermediate results of steps 1 to 3 should reside. 您必须确定步骤1到3的中间结果应该在哪里。

If the data from all the CSV files will fit in memory, then the intermediate results of steps 1 to 3 could just be a Map , with personID as the key. 如果所有CSV文件中的数据都可以存储在内存中，则步骤1至3的中间结果可能只是一个Map ，并以personID作为键。 If not, then the intermediate results of steps 1 to 3 should probably be written to a temp table in the database. 如果不是，则步骤1至3的中间结果可能应该写入数据库的临时表中。

Assuming all data will fit in memory, create a bean which can be injected into the ItemWriters of steps 1 to 3, for example: 假设所有数据都可以容纳在内存中，请创建一个可以注入到步骤1到步骤3的ItemWriters中的bean，例如：

// in a config class...
// assuming PersonID is of type Long
// Assuming Person class has appropriate attributes
Map<Long, Person> people = new HashMap<>();

Step 1: 第1步：

ItemReader - reads the next Person.CSV row and creates a Person instance ItemReader-读取下一个Person.CSV行并创建一个Person实例
ItemProcessor - nothing to do - pass the Person instance to the ItemWriter ItemProcessor-无关紧要-将Person实例传递给ItemWriter
ItemWriter - adds the Person instance to the people Map (or intermediate table). ItemWriter -增加Person实例到people地图（或中间表）。

Step 2: 第2步：

ItemReader - reads the next Address.CSV row and creates an Address instance ItemReader-读取下一个Address.CSV行并创建一个Address实例
ItemProcessor - nothing to do - pass the Address instance to the ItemWriter ItemProcessor-无关紧要-将Address实例传递给ItemWriter
ItemWriter - adds the Address to the related Person from the people Map (or intermediate table). ItemWriter-将地址从people映射（或中间表）添加到相关的人员。 TODO: what should happen if there is an Address for a person that does not exist? 待办事项：如果不存在某人的地址，该怎么办？

Step 3: 第三步：

ItemReader - reads the next Employment.CSV row and creates an Employment instance ItemReader-读取下一个Job.CSV行并创建一个Job实例
ItemProcessor - nothing to do - pass the Employment instance to the ItemWriter ItemProcessor-无关紧要-将Jobing实例传递给ItemWriter
ItemWriter - adds the Employment to the related Person from the people Map (or intermediate table). ItemWriter-从people图（或中间表）向相关人员添加就业。 TODO: what should happen if there is an Employment for a person that does not exist? 待办事项：如果某人不存在工作，该怎么办？

Since there is nothing for ItemProcessor to do in steps 1 to 3, it might be better to use a Tasklet. 由于第1到第3步中ItemProcessor不需要执行任何操作，因此最好使用Tasklet。

Also, steps 1 to 3 could be done in parallel. 同样，步骤1至3可以并行进行。 It would probably increase performance, but there would be added complexity to ensure people is correctly populated. 这可能会提高性能，但是会增加复杂性以确保people正确填充。

Step 4: 第四步：

ItemReader - reads the next element of people (or composite object from intermediate tables) ItemReader -读取的下一个元素people （或从中间表复合对象）
ItemProcessor - apply business logic ItemProcessor-应用业务逻辑
ItemWriter - write result to database ItemWriter-将结果写入数据库

如何在Spring Batch中读取多个CSV文件以合并数据进行处理？

问题描述

Overall Requirement: 总体要求：

Person.csv### (total size is 8 millions) Person.csv ###（总大小为800万）

Address.csv 地址.csv

Employment.csv 职业.csv

1 个解决方案

解决方案1
0 2019-09-18 17:02:45

如何在Spring Batch中读取多个CSV文件以合并数据进行处理？

问题描述

Overall Requirement: 总体要求：

Person.csv### (total size is 8 millions) Person.csv ###（总大小为800万）

Address.csv 地址.csv

Employment.csv 职业.csv

1 个解决方案

解决方案1 0 2019-09-18 17:02:45

解决方案1
0 2019-09-18 17:02:45