I am seeking some guidance please on how to structure a spring batch application to ingest a bunch of potentially large delimited files, each with a different format.
The requirements are clear:
We have one step which uses a MultiResourceItemReader which processes files in sequence. The files are inputstreams which time out.
Ideally I think we want to have
Thanks in advance.
This is a fun one. I'd implement a customer line tokenizer that extends DelimitedLineTokenizer
and also implements LineCallbackHandler
. I'd then configure your FlatFileItemReader
to skip the first line (list of column names) and pass that first line to your handler/tokenizer to set all your token names.
A custom FieldSetMapper
would then receive a FieldSet
with all your name/value pairs, which I'd just pass to the ItemProcessor
. Your processor could then build your JSON strings and pass them off to your writer.
Obviously, you job falls into typical - reader -> processor -> writer category with writer being optional in your case ( if you don't wish to persist JSON before sending to RESTFul API) or you can call step to send JSON to REST Service as Writer
if Writer
is done after receiving response from service.
Anyway, you don't need a separate step to just know the file name. Make it part of application initialization code.
Strategies to parallelize your application are listed here .
You just said a bunch of files. If number of lines in those files have similar count, I would go by partitioning approach ( ie by implementing Partitioner
interface, I will hand over each file to a separate thread and that thread will execute a step - reader -> processor -> writer ). You wouldn't need MultiResourceItemReader
in this case but simple single file reader as each file will have its own reader. Partitioning
If line count in those files vary a lot ie if one file is going to take hours and another getting finished in few minutes, you can continue using MultiResourceItemReader
but use approach of Multi-threaded Step to achieve parallelism.This is chunk level parallelism so you might have to make reader thread safe.
Approach Parallel Steps doesn't look suitable for your case since your steps are not independent.
Hope it helps !!
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.