简体   繁体   中英

Spring Batch - How to set chunk > 1 when items have interdependencies?

I have a Spring Batch job that needs to write all items to a DB in one chunk. I want this behavior because

  1. I need to do some validation involving comparisons among the read items prior to writing.
  2. If any validation fails, I need to make sure that no items are persisted or all items are persisted.
  3. It's a small amount of data < 60kb.

The item is a Car and it has a Model property. The problem that I'm having is that part of the item processing checks to see if the Model already exists in the database. If it does, it retrieves the model from the DB (including auto generated id) and therefore does not create another DB entry.

Now, that works fine if the Model was already persisted in a previous job run. However, it fails when the same Model shows up for two different Car s in the same job run and the Model was not persisted previously. In that case, two database rows get created for the same Model .

I'm using JPA with Model having an id property annotated with @Id and @GeneratedValue(strategy = GenerationType.IDENTITY) . How can I ensure prior to writing that the same Model is not written to the database more than once?

I would suspect this is happening because each chunk is comprised of a transaction, and you're performing validation and writes for multiple elements within that single transaction, and one of the writes could invalidate the validation of another item to be written.

If you really want to use the chunk-oriented processing in this case, it seems like you just need to implement some sort of caching solution in your ItemProcessor . When you create a new Model , you would then store an identifier for that model in something like a Set . Every time the processor processes a Car item, it checks the cache, then checks the database for an existing model prior to creating a new one.

Otherwise, you could avoid the chunk-oriented processing entirely by using a simple Tasklet step. With this, you would have full control over the logical transactions. The primary reason to do chunk-oriented processing is so that you can write a big chunk of items all at once to gain efficiency. Since you're doing extra validation with the database for each item, you're losing some of that benefit.

If your Model table is small enough, you could cache it and then do all of the validation in-memory.

Here are two options to solve for this issue:

  1. Create a first Step to read/process/write Model s
  2. In the ItemProcessor<Car> , check if the Model is in the DB as is currently being done. If it's not in the DB, go ahead and write it to the DB prior to writing all of the Car s.

The disadvantages to option 1 is that: 1) the file is read twice (once for Model s and once for Car s). 2) If the Car step fails, you could have Model s with no associated Car s.

Option 2 will involve more hits to the DB. However, the numbers of records is small. Also, the Model writes to the DB is in the same transaction as the write for all of the Car s. So, if Car s fail to persist so will its associated Model s.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM