I'm new to Spring Batch and trying to get some guidance for below requirement.
I've to get data from different systems, apply some business logic, save the result in DB.
Below is an example.
I need to read data from 3 CSV files. First file – person.csv – contains name and id Second File – address.csv – contains address info for each person. One person can have zero or multiple addresses.
Third File – employment.csv – contains employment info for each person. One person can have zero or multiple employers.
Here is some sample.
"personID", "personName"
1, Joey
2, Chandler
3, Ross
4, Monica
"personID", "addressType", "state"
1, residence, NY
1, mailing, NC
2, residence, NY
4, residence, NY
4, mailing, DC
"personID", "employerName"
1, emp1
2, emp2
2, emp3
3, emp4
Note: each file is sorted by person id.
To apply the business logic, I need to merge the data for each person, ie, I need to merge person, address, employment data for one person to apply the logic. Can you suggest any approach for this.
It sounds like a 4 step , job. You'll have to decide where the intermediate results of steps 1 to 3 should reside.
If the data from all the CSV files will fit in memory, then the intermediate results of steps 1 to 3 could just be a Map
, with personID
as the key. If not, then the intermediate results of steps 1 to 3 should probably be written to a temp table in the database.
Assuming all data will fit in memory, create a bean which can be injected into the ItemWriters of steps 1 to 3, for example:
// in a config class...
// assuming PersonID is of type Long
// Assuming Person class has appropriate attributes
Map<Long, Person> people = new HashMap<>();
Step 1:
people
Map (or intermediate table). Step 2:
people
Map (or intermediate table). TODO: what should happen if there is an Address for a person that does not exist? Step 3:
people
Map (or intermediate table). TODO: what should happen if there is an Employment for a person that does not exist? Since there is nothing for ItemProcessor to do in steps 1 to 3, it might be better to use a Tasklet.
Also, steps 1 to 3 could be done in parallel. It would probably increase performance, but there would be added complexity to ensure people
is correctly populated.
Step 4:
people
(or composite object from intermediate tables)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.