I have a Spring Batch job that needs to write all items to a DB in one chunk. I want this behavior because
The item is a Car
and it has a Model
property. The problem that I'm having is that part of the item processing checks to see if the Model
already exists in the database. If it does, it retrieves the model from the DB (including auto generated id) and therefore does not create another DB entry.
Now, that works fine if the Model
was already persisted in a previous job run. However, it fails when the same Model
shows up for two different Car
s in the same job run and the Model
was not persisted previously. In that case, two database rows get created for the same Model
.
I'm using JPA with Model
having an id property annotated with @Id
and @GeneratedValue(strategy = GenerationType.IDENTITY)
. How can I ensure prior to writing that the same Model
is not written to the database more than once?
I would suspect this is happening because each chunk is comprised of a transaction, and you're performing validation and writes for multiple elements within that single transaction, and one of the writes could invalidate the validation of another item to be written.
If you really want to use the chunk-oriented processing in this case, it seems like you just need to implement some sort of caching solution in your ItemProcessor
. When you create a new Model
, you would then store an identifier for that model in something like a Set
. Every time the processor processes a Car
item, it checks the cache, then checks the database for an existing model prior to creating a new one.
Otherwise, you could avoid the chunk-oriented processing entirely by using a simple Tasklet step. With this, you would have full control over the logical transactions. The primary reason to do chunk-oriented processing is so that you can write a big chunk of items all at once to gain efficiency. Since you're doing extra validation with the database for each item, you're losing some of that benefit.
If your Model
table is small enough, you could cache it and then do all of the validation in-memory.
Here are two options to solve for this issue:
Step
to read/process/write Model
s ItemProcessor<Car>
, check if the Model
is in the DB as is currently being done. If it's not in the DB, go ahead and write it to the DB prior to writing all of the Car
s. The disadvantages to option 1 is that: 1) the file is read twice (once for Model
s and once for Car
s). 2) If the Car
step fails, you could have Model
s with no associated Car
s.
Option 2 will involve more hits to the DB. However, the numbers of records is small. Also, the Model
writes to the DB is in the same transaction as the write for all of the Car
s. So, if Car
s fail to persist so will its associated Model
s.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.