简体繁体中英

spring batch structure for parallel processing

原文 2016-09-06 11:34:59 2 2 java/ spring-batch

I am seeking some guidance please on how to structure a spring batch application to ingest a bunch of potentially large delimited files, each with a different format.

The requirements are clear:

select the files to ingest from an external source: there can be multiple releases of some files each day so the latest release must be picked
turn each line of each file into json by combining the delimited fields with the column names of the first line (which is skipped)
send each line of json to a RESTFul Api

We have one step which uses a MultiResourceItemReader which processes files in sequence. The files are inputstreams which time out.

Ideally I think we want to have

a step which identifies the files to ingest
a step which processes files in parallel

Thanks in advance.

2 answers

This is a fun one. I'd implement a customer line tokenizer that extends DelimitedLineTokenizer and also implements LineCallbackHandler . I'd then configure your FlatFileItemReader to skip the first line (list of column names) and pass that first line to your handler/tokenizer to set all your token names.

A custom FieldSetMapper would then receive a FieldSet with all your name/value pairs, which I'd just pass to the ItemProcessor . Your processor could then build your JSON strings and pass them off to your writer.

Obviously, you job falls into typical - reader -> processor -> writer category with writer being optional in your case ( if you don't wish to persist JSON before sending to RESTFul API) or you can call step to send JSON to REST Service as Writer if Writer is done after receiving response from service.

Anyway, you don't need a separate step to just know the file name. Make it part of application initialization code.

Strategies to parallelize your application are listed here .

You just said a bunch of files. If number of lines in those files have similar count, I would go by partitioning approach ( ie by implementing Partitioner interface, I will hand over each file to a separate thread and that thread will execute a step - reader -> processor -> writer ). You wouldn't need MultiResourceItemReader in this case but simple single file reader as each file will have its own reader. Partitioning

If line count in those files vary a lot ie if one file is going to take hours and another getting finished in few minutes, you can continue using MultiResourceItemReader but use approach of Multi-threaded Step to achieve parallelism.This is chunk level parallelism so you might have to make reader thread safe.

Approach Parallel Steps doesn't look suitable for your case since your steps are not independent.

Hope it helps !!

Spring batch parallel processing

Spring Batch - Parallel Processing - Flow config

Implementation of TaskExecutor in Spring Batch for parallel processing

Spring Batch Parallel Processing executing one step multiple times

Number of parallel threads processing is capped to 10 with Spring Batch

Parallel processing in Spring Integration

Parallel processing Windows batch file

Kafka: Bounded Batch Processing in Parallel

Spring Batch Chunk processing

Spring batch- parallel processing of two tasks but second task has dependency on first task

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question Spring batch parallel processing Spring Batch - Parallel Processing - Flow config Implementation of TaskExecutor in Spring Batch for parallel processing Spring Batch Parallel Processing executing one step multiple times Number of parallel threads processing is capped to 10 with Spring Batch Parallel processing in Spring Integration Parallel processing Windows batch file Kafka: Bounded Batch Processing in Parallel Spring Batch Chunk processing Spring batch- parallel processing of two tasks but second task has dependency on first task

Related Tags

spring batch structure for parallel processing

Question

2 answers

solution1
0 2016-09-06 11:58:27

solution2
0 2016-09-07 10:40:52

spring batch structure for parallel processing

Question

2 answers

solution1 0 2016-09-06 11:58:27

solution2 0 2016-09-07 10:40:52

solution1
0 2016-09-06 11:58:27

solution2
0 2016-09-07 10:40:52