简体   繁体   中英

Spring batch : Propagate exception encountered in partitioned step (Stop job execution)

Background

I currently have a spring-batch job that reads a flat file. The job uses a MultiResourcePartitioner to read physical partitions of a file that has been split into N number of smaller files. This means that each physical partition of the file will result in a new slave step being executed that reads the partition.

The problem

If there is any issue reading any physical partition, the execution of that slave step will fail and the exception will be logged by spring batch. This does not impact the execution of the remaining slave steps that are reading different physical partitions of the file; however, this is not the desired behavior. What I want is that if there is an issue reading a particular physical partition (Example : not being able to parse a particular column), the exception should be propagated to the location where the Job was launched so that I can halt any further processing.

The current implementation of the execute method in AbstractStep catches Throwable and suppresses the exception by logging it. As a result, the exception is not propagated to the location where the Job was launched and there is no way to halt the execution of the remaining slave steps.

How can I make spring-batch propagate any exception that occurs in a slave step all the way to the location where the Job was launched? I want to do this so that I can halt any further processing if there is an issue processing any of the partitioned files.

If there is any issue reading any physical partition, the execution of that slave step will fail and the exception will be logged by spring batch. This does not impact the execution of the remaining slave steps that are reading different physical partitions of the file; however, this is not the desired behavior.

I would argue that the fact that "This does not impact the execution of the remaining slave steps" is the desired behaviour. Usually, the idea behind partitioning a big chunk of work into smaller tasks which are executed in parallel is that tasks should be independent from each others and one failure should not impact others. If there is a logic that requires the failure of one task to stop other tasks, it means that tasks are not well defined to be independent and executing them in a local/remote partitioned step is not the appropriate choice in the first place.

What I want is that if there is an issue reading a particular physical partition (Example : not being able to parse a particular column), the exception should be propagated to the location where the Job was launched so that I can halt any further processing.

You need a custom PartitionHandler for that. This is the piece that coordinates worker steps. The default behaviour is to wait for all workers steps to finish and aggregate the results before reporting to the main job. Your custom implementation should detect the failure of any worker step and inform others to stop.

Moreover, this design of stopping/failing all workers if one of them fails is not appropriate for job restart. This means that restarting a job would restart all partitions, which is not the goal of a partitioned job in the first place where only failed partitions should be restarted.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM