[英]How to read multiple CSV files in Spring Batch to merge the data for processing?
I'm new to Spring Batch and trying to get some guidance for below requirement. 我是Spring Batch的新手,正在尝试获取有关以下要求的指南。
I've to get data from different systems, apply some business logic, save the result in DB. 我必须从不同的系统中获取数据,应用一些业务逻辑,并将结果保存在DB中。
Below is an example. 下面是一个例子。
I need to read data from 3 CSV files. 我需要从3个CSV文件中读取数据。 First file – person.csv – contains name and id Second File – address.csv – contains address info for each person. 第一个文件– person.csv –包含名称和ID。第二个文件– address.csv –包含每个人的地址信息。 One person can have zero or multiple addresses. 一个人可以有零个或多个地址。
Third File – employment.csv – contains employment info for each person. 第三个文件– Employment.csv –包含每个人的就业信息。 One person can have zero or multiple employers. 一个人可以有零个或多个雇主。
Here is some sample. 这是一些示例。
"personID", "personName" “ personID”,“ personName”
1, Joey 1,乔伊
2, Chandler 2,钱德勒
3, Ross 3,罗斯
4, Monica 4,莫妮卡
"personID", "addressType", "state" “ personID”,“ addressType”,“ state”
1, residence, NY 纽约市1号住宅
1, mailing, NC 1,邮寄,数控
2, residence, NY 纽约市2号住宅
4, residence, NY 纽约市4号住宅
4, mailing, DC 4,邮寄,DC
"personID", "employerName" “ personID”,“ employerName”
1, emp1 1,emp1
2, emp2 2,emp2
2, emp3 2,emp3
3, emp4 3,emp4
Note: each file is sorted by person id. 注意:每个文件均按人员ID排序。
To apply the business logic, I need to merge the data for each person, ie, I need to merge person, address, employment data for one person to apply the logic. 要应用业务逻辑,我需要合并每个人的数据,即,我需要合并一个人的人,地址,就业数据以应用逻辑。 Can you suggest any approach for this. 您能为此建议任何方法吗?
It sounds like a 4 step , job. 这听起来像一个4 步 ,工作。 You'll have to decide where the intermediate results of steps 1 to 3 should reside. 您必须确定步骤1到3的中间结果应该在哪里。
If the data from all the CSV files will fit in memory, then the intermediate results of steps 1 to 3 could just be a Map
, with personID
as the key. 如果所有CSV文件中的数据都可以存储在内存中,则步骤1至3的中间结果可能只是一个Map
,并以personID
作为键。 If not, then the intermediate results of steps 1 to 3 should probably be written to a temp table in the database. 如果不是,则步骤1至3的中间结果可能应该写入数据库的临时表中。
Assuming all data will fit in memory, create a bean which can be injected into the ItemWriters of steps 1 to 3, for example: 假设所有数据都可以容纳在内存中,请创建一个可以注入到步骤1到步骤3的ItemWriters中的bean,例如:
// in a config class...
// assuming PersonID is of type Long
// Assuming Person class has appropriate attributes
Map<Long, Person> people = new HashMap<>();
Step 1: 第1步:
people
Map (or intermediate table). ItemWriter -增加Person实例到people
地图(或中间表)。 Step 2: 第2步:
people
Map (or intermediate table). ItemWriter-将地址从people
映射(或中间表)添加到相关的人员。 TODO: what should happen if there is an Address for a person that does not exist? 待办事项:如果不存在某人的地址,该怎么办? Step 3: 第三步:
people
Map (or intermediate table). ItemWriter-从people
图(或中间表)向相关人员添加就业。 TODO: what should happen if there is an Employment for a person that does not exist? 待办事项:如果某人不存在工作,该怎么办? Since there is nothing for ItemProcessor to do in steps 1 to 3, it might be better to use a Tasklet. 由于第1到第3步中ItemProcessor不需要执行任何操作,因此最好使用Tasklet。
Also, steps 1 to 3 could be done in parallel. 同样,步骤1至3可以并行进行。 It would probably increase performance, but there would be added complexity to ensure people
is correctly populated. 这可能会提高性能,但是会增加复杂性以确保people
正确填充。
Step 4: 第四步:
people
(or composite object from intermediate tables) ItemReader -读取的下一个元素people
(或从中间表复合对象)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.